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that a connected a account in book form will be of general 3 to 
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Until a few years ago rank correlation was a rather neglected  . 
branch of the theory of statistical relationship. In the practical 
field it was gencrally regarded, except perhaps by psychologists, as 
a makeshift for the correlation of measurable variables ; and in the 
theoretical field it seemed to present no. interesting or important 
problems. . That situation has changed. Practical applications of 


ranking methods are not only being extended in psychology and _ 
education but are being made in other „subjects such as industrial. = ` 
experimentation and economics. The theoretical „properties of order- + “, 
statistics have réceived much attention and are throwing important T 
light оп some difficult questions of statistical inference. gos 


Тће aim of this book is to give an account of the new ranking У 
techniques for the use of those workers who, from choice or necessity, 
have to employ ranked material. T was encouraged to write it by 
some of my friends who favour the methods for psychological. Work, 
but the treatment will, I. hope, be found useful by other scientific | 
workers. Most of the results presented herein are available only in 
research journals, and some of them are not yet published. I hope 
statisticians and “users of statistical methods. 5% 

The theory of this subject i is, from the mathematical \ point, 
rather complicated. I had to try to meet the needs both of those 


readers who are interested solely in applications of the theory. and 
ЕСІ those who wish to go to the root of the theory. It is doubtful 


whether the methods themselves, in ‘this or any branch of statistics, 


can be safely applied without some knowledge of the underlying 


theory ; on the other hand, few workers in the practical field have the 


inclination or the time to master the complicated mathematical 


% 


derivation of the formulae they require. What I have done is to - 


write alternate chapters, one describing the results, their applications 


and the basie ideas, the other deriving the mathematical results in 
detail. Thus any reader who is not interested in the mathematies 
can omit the advanced chapters without seriously ‘interrupting the 
continuity of his reading and can refer to them later if he. wishes. 
This is a novel procedure in statistical text-books, but. it seems to 


me prefer: rable to the alternative courses of omitting the more advanced 


V ~ обв “ 


~ 
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mathematics entirely, of relegating them to inconvenient appendices, 
or of interspersing them through the general text. 

I am indebted to Dr. H. E. Daniels, Mr. J. W. Whitfield and 
Mr. A. K. Gayen, who read the typescript or the page proofs and 
made a number of helpful suggestions for improvement. Му thanks 
are also due to Professor R. A. Fisher and Messrs. Oliver & Boyd 
for permission to reproduce Tables 7A, 7B, and 8 from their Statistical 
Methods for Research Workers; and to Dr. Milton Friedman and 
Professor 5. 5. Wilks for permission to reproduce Appendix Table 6 
from the Annals of Mathematical Statistics. I should be grateful if 
any readers who detect errors or obscurities would call my attention 
to them. 

4 M. G. K. 

LONDON, 

August, 1948 
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THE MEASUREMENT OF RANK CORRELATION 


Introductory remarks 


1.1 When a number of individuals are arranged in order accord- 
ing to some quality which they all possess to a varying degree, they 
are said to be ranked. The arrangement as а whole is called a ranking 
in which each member has a rank. 


1.2 It is eistomary, but not essential, to denote the ranks by 
ordinal numbers 1, 2, . . . n where n is the number of objects. Thus 
the object or individual which comes fifth in the ranking has the 
rank 5. In the sequel we shall often operate with these numbers as 
if they were the cardinals of ordinary arithmetic, adding them, 
subtracting them and even multiplying them; and it is of some 
importance to realise exactly what such processes mean. 


1.3 Suppose, for example, that an object has a rank 5 when the 
set of objects is ranked according to some quality +4 and a rank 8 
according to a second quality B. What is implied by saying that the 
difference of the ranks is 87 We cannot subtract “ fifth’? from 
“eighth ? ; but a meaning сап be ып to the process nevertheless. 
To say that the rank according to 4 is 5 is equivalent to saying that, 
in arranging according to A, four members are given priority over our 
particular member, or are preferred to it. Similarly, seven members 
are preferred in the ranking according to B. Consequently the 
number of preferences in the B-ranking exceeds the number in the 
A- -ranking by 3; and this is not an ordinal number but a cardinal 
number, i.e. arises by counting. 

This may strike the reader as a precious distinction which is 
hardly worth making at the present stage. If so, he can put it aside 
until it arises later. He should realise, however, from the outset that 
the numerical processes associated with ranking are essentially those’ 
of counting, not of measurement. | 

1.4 In practice, ranked material сап arise in many different : 
ways, some of which may be briefly mentioned : 


(a) Purely as arrangements of objeets which are being considered 
n 
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only by reference to their position in space or time. For instance, if 
: Ж we arrange a pack of cards in some order and then shuffle them, the we 
a new order is a ranking which may be compared with the old to see 
"whether the shuffling process is a thorough one. We are interested in 
the spatial arrangement alone—not, for example, in whether some 
= objects are greater than” or “ less than " others іп the intensity 
^ of.a common quality. 
a a (b) According to some quality which we cannot measure on any 
objective scale. For instance, we might rank a set of mineral speci- 
mens according to * hardness" by some such simple criterion as 
` © saying that A is harder than B if A scratches B when the two are 
rubbed together. If A scratches B and B scratches С then 4 will 
scratch C, so that by making a number of comparisons we can rank 
the objects without ambiguity (unless two of them are equally hard, 
a special case we shall consider in Chapter 8). There is, however, no 
method of measuring hardness implicit in this approach. We can 
always decide whether А is harder than B, but we cannot say that it 
is twice as hard without imposing some scale of measurement on the 
system. 5 Г А 
(с) According to some measurable or countable quality. For 
instance, we may rank individuals according to height, or countries 
according to size of population. It may not always be necessary to 
carry out the actual measurement in such cases, as for instance, if we 
arrange a class of students in order of height “ by eye” ; but the 
quality according to which the ranking is made is capable of practical 
measurement. Ro ur 
(d) According to some quality which we believe to be measurable є 
but cannot measure for practical or theoretical reasons. For instance, 
we may rank a number of persons according to “ intelligence ” on the 
assumption that there is such a quality and that individuals can be 
ranked according to the degree of intelligence which they possess. In 
Chapter 11 we shall consider a method which enables us to investigate 
in some cases whether these assumptions are legitimate. The reason 
for differentiating this case from that of paragraph (b) is that in the 
latter we know from physical considerations that ranking is possible, 
whereas in the former the possibility is a hypothesis. 


МЕ 
у 


1.5 In the theory of statistics a quantity which may vary from 
one member of a population to another is called a variate. In 
particular, a measurable quality provides a variate and, of course, 
а scale. We can always rank а set of individuals according to their 
position on a scale and may then be said to replace variate-values by 
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ranks. A ranking may then be regarded as a less accurate way of 
expressing the ordered relation of the members—less accurate because 
it does not tell us how close the various members may | y be on the scale. 
Per contra, what the ranking loses in accuracy it gains in generality, + 
for if we stretch the scale of measurement (and even if we stretch it а 
differently in different regions) the ranking remains unaltered; in ^ 
mathematical language it is invariant under stretching of the scale, 


+. 


1.6 Historically the theory of ranks developed as an offshoot ов“ 
the theory of variates. In the early stages ranks were regarded in 
the main as makeshifts substituted for variate measurements to save — 
time or trouble or to avoid the difficulties of setting up an objective 
scale. More recently they have been recognised as having an impor- 
tance of their own, and in the earlier part of this book we shall TT 
consider ranking problems as such without any reliance on an under- 
lying scale. Our methods thus have very considerable generality. 
Тће relationship between ranks and variates will be discussed in 
Chapters 9 and 10. 


Rank coftelation 


1.7 Suppose a number of boys are ranked according to their 
ability in mathematies and in musie. Such a pair of rankings for ten 
boys, denoted by the letters 4 to J, might be as follows: 

Boum АЖ УБ; го розе в "Go" 


J 
Maths. : 7 4 3 10 6 2 9 8 1 5 
міс: 5 7 3 10 1 9 6 2 8 4 


а 5 И) 


We are interested in whether there is any relationship between 
ability in mathematics and music. A glance at these rankings shows 
that there is far from being perfect agreement, but that some boys 
occupy the same or nearly the same position in both subjects. We 
can see the correspondence more easily if we re-arrange one ranking - 
in the natural order, thus: : 

Boy: I F с B 


J E 
Maths. : 1 2 3 4 5 6 
Musie : 8 9 3 Ni 4 1 


= 4 З 4 7 (42) 


What we wish to do is to measure the degree of correspondence 
between these two rankings, or to measure the intensity of rank 
correlation. We shall accordingly show how to construct a coefficient 
for this purpose which will be denoted by the Greek letter т (tau). 


4 3 4 RANK CORRELATION METHODS 


1.8 Such а coefficient should have three properties : 

(а) the agreement between the rankings is perfect, i.e. every 
individual has the same rank in both, 7 should be +1, 
indicating perfect positive correlation ; 

(b) if the disagreement is perfect, i.e. one ranking is the inverse of 


` the other, т should be — 1, indieating perfect negative 
correlation ; 


(c) for other arrangements т should lie between these limiting 
values ; and in some acceptable sense increasing values from 
— 1 to should correspond to increasing agreement between 
the ranks (see References to Chapter 2). 


The first two of these requirements are only conventions, but are 
by far the most useful conventions to employ. 


1.9 In the first ranking of (1.1) consider any pair of rank 


members, say AB. Their ranks, 7, 4, occur in the inverse order 


(taking the natural order 1 . . . 10 as the correct order) and hence 
we will score for this pair the value — 1. Had the pair been in the 
right order we should have scored + 1. In the second ranking the 
pair АВ has ranks 5, 7, which is in the right order, and we will, 
therefore, score + 1 in this ranking. , 

We now multiply the scores for this pair in both rankings and 
hence arrive at the score — 1. Evidently, for any pair, the resulting 
score is + 1 if their ranks are in the same order, — 1 if they are 
in different orders. We may say that we score + 1 or — 1 according 
as the pair agree or disagree in the two rankings. > 

The same procedure is followed for each possible pair from the 
ranking of 10. There are 45 such pairs and the scores are as follows 
(we write them down in full so that the reader can follow the m 


GNI ethod, 
but in practice, as we show presently, this is unnecessary) : 


Pair Score Pair Score Pair Score Pair Score 
AB —1 ВЕ —1 DE +1 ЕН EN. 
AC +1 BG —1 a, DE zal FI 41 
AD 1 BH -і DG +1 ЕЛҮ т 
AE E^ 1 BI —1 DH +1 > GH n 
AP —1 BJ —1 DI +1 GI à 1 
AG 21 ср [rl DJ "pur GJ ^g 1 
АН = 1 СЕ — 1 ЕЕ —1 HI — 1 
АТ => СЕ — 1 EG 1 HJ i 
AJ +1 CG «ЕСІ ЕН E TI = 1 
BC +1 CH —1 EI —1 р: 
вр Вт GI —1 EJ 1 

ВЕ E CJ +1 FG = 


The total of positive scores, say P, is 91 and that of negative 


— — 
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scores, say Q, is — 24. Adding these two we arrive at a total score, 
say S, of — 8. s 
Now if the rankings are identical each of the 45 unit scores will 
be positive and hence the maximum value of § is 45. Similarly 
the minimum value of S is — 45. We therefore calculate v as 
Actual score = — 3 . 4 


2 = = — 0:07. 
Maximum possible score = 45 


This is near to zero and indicates very little correlation between the 
two rankings. А zero value may, in fact, be regarded as indicative 
of independence—halfway, so to speak, between complete positive 
dependence and complete negative dependence. 


1.10 Consider now the general case. If we have two rankings 
of n, the number of pairs of comparisons which can be made is equal 
to the number of ways of choosing two things from n, which is 


4n(n — 1), sometimes written as (5) This is the maximum value 
2 


of the score, attained when the rankings are identical. If & is the 
total score we define the correlation coefficient by 


А. 8 
тп 1) 


If P and 0 are the positive and negative scores we have the equivalent 
form (since P + 0 = $n(n — 1)) 


E 


7 Tür) 


U TT 2 (14) 
ex xg 190 
zl Tae D. > 15) 
2р 
"ый 1) -1. 2 . (1:6) 


1.11 The determination of the score & (ог equivalently of P or 
of Q) does not require the detailed procedure we have followed above. 
There are several short-cut methods of which the following are 
probably the easiest. 

(a) Consider the rearranged form of (1.2). When one ranking is 
in the natural order, 1, 2, . . . n, all unit scores arising from it are 
positive. Consequently, the contributions to P will arise only from 


- we find a contribution to Tof . 


“Hence; from (1.6), 
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pairs in the second ranking which are in the right order. These are 
all we need to count. The second ranking is 


8 9 ӛз! 77 4 1 5 2 6 10 


Considering first the pairs associated with + 
that there are two members greater th 
contribution to P is -therefore + 9, 
with 9 (other than 8 9 which has alr 


he first member 8, we see 
an 8 on the right of it. The 
Taking now pairs associated 
саду been taken into account) 
Similarly the contribution of 
ubers to the right of it, is + 5. 


pairs associated with 3, arising from mer 
Proceeding in this way we find 
2414871 +844 527271 


= 21. 


“, 42 Д 
у * 1 = — 0:07, as before. 

" ^ ка 45 
(5) ТЕ itis too ‘troublesome to rearrange the т. 
of them is in the right order we m 


previously considered write down t 


ankings so that one 
ay proceed thus. In the rankings 


; he natural order above them, thus : 


ы. ә 


„ͤ SD MT - 9. 10 
А ЧИ ЧӨ 04 ш” igh We i 8 
B РОМЕЛ UL лесу SES 8,74 


The number 1 in ranking B has a 6 above it in ranking 4. In 
the natural ranking 6 has four members to the right. Score 4 and 
delete the 6 from the natural ranking. Now the number 2 in B has 
an 8 above it in 4 and in the natural ranking 8 has two members to 
the right. Score 2 and strike out the 8 from the natural ranking. 
Proceeding in this way we find scores of | 


APSE SPEEA 5141-2401 

“which gives the value of P, as foun 

The validity of this rule is eviden 

so as to put B in the natural order 
have considered the members) : 


+0 = 21, 
а ађоуе, 
t if we rearran 


е the ranki 
(this being the onder ite талікіпдв 


€ order in Which we 


804 % , 


5 K 9 6 ^ 
A бу NAE E сұры 2 10 
В 1 2 3 4 5 6 7 8 9 10 


"x 
Тће contributions to P by method (b) are the Same as О sem 
method (a) applied to the rankings B and 4. There пева. E E 110 Y 
4 members to the right of 6 in 4 which are great А 


E €r than 6 : 9 
to the right of 8 in A which аге greater than в; and E ons 


м —"— АНИ 
6 
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1.12 It may help to give some idea of the values assumed by т 

in particular cases if we set out some rankings of 10 and the corre- 

sponding т obtained by correlating them with the natural order. The 
reader should check these values as an exercise. 


Ranking 2 Value of t 
a A v 2 1d H8 6 SS Bye <9 + 011 
b 1 6 2 Т 38 “8-ж2. "92/54: 10 + 0:56 
5 7 10 4 1 6 8 ыз 78 2 3 — 0:24 
d б „ 4 Қ 78 8/79 » + 0:02 
є 10 1 ә 4,4-5 8 T" ико + 0:60 
TJ 10 о cs) NM бан Х1 2 з 4 3 — 0:56 


т аз a coefficient of disarray 
е --— 
1.13 "The coefficient as we have introduced it provides а kind of 
average measure of the agreement between pairs of members (< agree- 
ment", that is to say, in respect of order) and thüs. has evident 
recommendations as а measure of the concordance between two 
rankings. There is another instructive way of, looking at the 
coefficient. Consider the two rankings of 7: Мы 
А SLB 4:6 С 
B 6 3 5 үй 1 2 4 : 
We may transform B into A by successively interchanging pairs of 
neighbours. For instance, interchanging 1 with its left-hand neigh- 
bours, we have, in four stages— 


6 3 5 L 7 2 4 
6 8 1 5 7 2 4 
om 1 3 5 7 2 4 
1 6 3 5 7 2 4 
Now interchanging 2 we find, after four more stages, . ^ 
# > ͤo‚ UN RA ‚ * 
Interchanging the 3 and 6 gives us, 4 


1 2 3 6 5 0 4 


Interchanging with the 4 gives us, in three stages, 
1 2 3 4 6 5 7 


Finally, interchanging the 6 and 5 gives us the natural order 4. 

This transformation has taken 13 moves, and we could not have 
made it in fewer. We might have taken more, as for instance if we 
had interchanged“ and 2 and back again before making the above 
sequence of moves. It will be clear that there is a minimum number 
of moves which transform any ranking into any other ranking of the 
same number of members. Call this number s. 


RANK CORRELATION METHODS 
Then we shall show in the next chapter that 
5 = Q, or equivalently (1.7) 
s = Hann — 1) — 8) г : 4 ў 
which gives a simple relation between the 


and the negative score 0 or the total scor 
have just considered 5 — 


number of interchanges 8 
е S. In the example we 
— 5, n — 7, and hence 

$ = #21 + 5) = 18, as found. 
From (1.5) and (1.7) it follows that 


2s 


L'A NN 
in(n — 1) 


(1.8) 
exhibiting z as a simple function of the minimum number of inter- 
changes between neighbours required to transform one ranking into 
the other—in Short, as a kind of coefficient of disarray. 


Spearman's р 


1.14 We now discuss another 
denoted by the Greek letter р (rho) 
who introduced it into psychol: 
rankings of 10 given in (1.1) 


coefficient of rank correlation 
and named after C, Spearman, 
ogical work. Consider again the two 


Mathematics : n 4 3 10 6 2 9 8 1 5 

Music : 5 7 S #10 1 9 6 2 8 4 

Differences 1 2 — 3 0 0 5 — 7 3 бағы Tis 
P d? Жы 5 7.0707 нов“ FANE o 49 1 


We have subtracted the ranks for music from those for mathematics 
and shown the results in the row called“ Differences 4”, "hese 
differences should sum to zero (which provides an arithmetical check) 
because the sum is the difference of two quantities each of wh 
the sum of the numbers 1 to 10. We have also shown the 
of these differences. Denoting now the sum of these squ 


ich is 
Squares 


ares b. 
S(d*) we define Spearman's p by the equation y 
65(d2) 
ay ees 4 
А n? —q Е : (1.9) 
ог, in our present example, 
6 x 182 
p=1— 
990 
= — 0:103. 


1.15 When two rankings are identical all the differences d are 
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zero and from (1.9) it follows that р = 1. We will now prove that 
when one ranking is the reverse of the other, p=— 1. 

Suppose that n is odd, and is equal to 2m +1. We lose no 
generality by writing one ranking in the natural order, and the rank- 
ings and differences may then be expressed as follows: . 


А: 1 De m, т + 1, m ＋ 25 2m, 2m + 1 
B: 2m ＋ 1, 2m ... m 4 2, m + 1, Tis ied 2, 1 
а: —2m, —(2m — 2) — 2, 0, 2, 2m — 2, 2т 

(1.10) 


The sum of squares is thus given by 
S(d?) = 8{m? + (m —1? +... +224 1°} 
= 8m(m + 1)(2m + 1)/6 
= {(% — 1)( + 1)(n) 


= {(%% — n). 
If we substitute this value in (1.9) we find 
part 6/8 351. 
If т is even, say equal to 2m, we have similarly 
А: 1. 9 s У ПБ m ＋ 1... . . 2m — 1, 2m 
B: 2m, 2m —1,...m +1, N, 2, 1 
а: - (2m — 1), — (2m — 3) — 1, 1,...2т-8, 2m — 1 
Р 
Thus 


= 2{(2m)? + (2m — 1)? + (2m — 2)? +... + 32 4-2? 4 12) 
— 2{ (2m)? + (2m — 2)? +... + 22) 
= 4m(2m + 1)(4m + 1)/6 — 8m(m + 1)(2m + 1)/6 
= j(n* — n) 
and, as before, if we substitute in (1.9) we find p — — 1. 

The coefficient p thus obeys our general requirements that its 
possible values should range from — 1 to 4- 1, assuming the extreme 
values only when there is perfect disagreement or agreement between 
the rankings. 


S(d?) = drei. — 1)? + (2m — 8): E... + 83 + 12) 


1.16 The reason for taking the sum of squares of the rank- 
differences will be clear to the reader who is familiar with the calcula- 
tion of statistical measures of dispersion such as the standard 
deviation. It is obvious enough that we cannot base a coefficient 
on the sum of differences (4), for this is zero. It might be thought, 
however, that if we disregarded the signs of the differences and 


~ 
> 
p 10 RANK CORRELATION METHODS 
+ 1 1 "^ . 
= summed them a somewhat simpler coefficient could be reached; e 
- . indeed this was one of Spearman's original suggestions. The 
T procedure leads to difficulties 


in the more advanced theory, particu- 


larly in connection with sampling questions, and we shall not pursue it. 


“ бее 2.11 апа 2.12. 


Conjugate rankings 


1.17 One property which is common to both т and р may be 
noticed. If we correlate a given ranking 4 with the natural order 
В (I.. . . n), and again with the inverse order B’ (i sae 1), the 
p values of T are the same in magnitude but opposite in sign. This will 

- be seen from the definition of 7, for the effect of reversing the order B 
is to reverse the sign of each unit score contributing to &, and hence 
~ œ Ње sign of S itself. Thus, corresponding to any two rankings A and А 

B B (not necessarily natural orders) with correlation T there will be 


rankings 4 and B’ with correlation — т, Consider, for instance, the 
two rankings of 7: 


2 


ЯК 585021 бв. 
Bae i.e hy зе 


D {л 
on 
= 


5 „ a (122) 


Let us rearrange А in the natural order. We then ћаус— 


А „ 7 
ГИА, Ж 1 
and the value of 7 15 readily found to be — 11/21 or — 0:52. If we 
now invert the natural order, giving 
с 2 қ 
4 * D 4 1 


6 5 2 т 


1 
1 
we find for the correlation of Cand D 


4 3 


а value of + 0.59, Rearranging 
so that D becomes B of (1.12), we have 
A’ COT UNIO un ET 
B y 6 3 1 4 в р 
y w a18) 


The rankings A апа А” may be 
correlated with B they give values o 
but opposite in sign. 


regarded а 


5 conjugate. When 
f r which 1 5 


are equal in magnitude 


1.18 It is also true, though not so obvi 
between А, В and A’, В are equal in magnit pposite in sign. 
We shall prove the general result in the next chapter. The reader 
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^ 
may verify as an exercise that for the particular cases (1.12) and (1.13) 
we have respectively р = — 17/28 and р = + 17/28. 


1.19 In a sense, therefore, the scales of measurement of rank 
correlation set up by the use of t or p are symmetrical about the 
value zero. They range from — 1 to + 1, and corresponding to any 
given positive value of т or of р there is a negative value of the 
same magnitude arising from an inversion of one of the rankings. 
The scales, we may say, are unbiassed. 


1.20 The reader must not expect to find that the numerical 
values of т and р are the same for any given pair of rankings, except 
when there is complete agreement or disagreement. For the rankings 
of 1.12 correlated with the natural order we have the following values : 


Ranking т р 
а + 0-11 + 034 
5 + 0:56 + 0-64 
с — 0:24 — 0:37 
а + 0:02 + 0:08 
е + 0:60 + 0-45 
f — 0:56 — 0:76 


These will illustrate the sort of differences which arise in practice. 
We shall show in 5.15 that, when neither coefficient is too close to 
+ 1 or — 1, p is often about 50 per cent greater than т in absolute 
value. The coefficients have, in fact, different scales, like the different 
scales of Centigrade and Fahrenheit thermometers. This will give 
rise to no difficulty in practice, for in any particular investigation we 
shall always work with the same kind of coefficient. The differences 
emphasise nevertheless the importance of not attributing too much 
importance to the actual magnitude of a rank correlation. If we 
find a value of equal to 0:67 we can only say that there is ** two- 
thirds agreement ” if we recall clearly the nature of the coefficient 
and the scale of measurement which it sets up. The value of p might 
be 0-75 and indicate “ three-quarters agreement ". The two things 
are not inconsistent. They represent statements by reference to 
different scales. 


Stragglers 


1.21 In general, p is an easier coefficient to calculate than 7. 
We shall see in subsequent chapters that from many other practical 
and from most theoretical points of view т is preferable to p, and the 
majority of the work in this book will be based on the former. At 
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this stage we will not argue m. ipii ее = there is one 
i i actical point which is worth no icing. 
BE om i ed that when а ranking has Sen = = ж. 
individuals are added which necessitate re-ranking. | Simi ұлы” 
writing down the ranks of a series of individuals which are 1 
guished by variate-values or marks but are in disorder, we may ا‎ 
mistakes and find at the end of the ranking that a few have be 
omitted. This will require the calculation of p entirely de 00, 
whereas the addition of members to a ranking does not require 
а complete re- caleulation of т. An example will make the point clear. 
Example 1.1 

A confidential inquiry is sent out to 
the rates of dividend which they propose t 
general meetings, 


We will Suppose tha 
this question, but that there is some 


РН r 
à number of firms asking 11 
o declare at their next ње 
t they are all able to answ 


luetant to reply and will delay 
or will not reply at all. We will 
different, 


í y realistic assumptions but they 
will serye xample. 


(A) Order of receipt: 1,2 ваа Br 7 1 12 13 14 15 
(B) Percentage dividend: 15 13 1216 25 8 9 14 17 19 10 20 10 21 19 
(C) Rank of percentage : LOCIS 5977 10 41113 81412 


t wn in the last row 
and С we find S — 25, 

25 
Tux. 


1057 * 024. 
We also find S(d?) = 399, 
392 


P1 Bed DUM 
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This suggests some small positive correlation between A and C, and in 
Chapter 3 we shall see how to test its significance. That, however, is 
not the point of the present example. Suppose that, after these 
values of т and р have been worked out, two more replies arrive with 
percentages 7 and 23. Nearly all the ranks in row C above are 
affected and have to be re-numbered. The differences d and the sum 
S(d*) have to be ascertained anew. If, after this, further replies 
arrive the work has to be done once more. 

On the other hand, the effect of the addition of the two extra 
values оп & can be ascertained very simply. The new member with 
percentage 7 and rank in the A-ranking of 16 merely has to be con- 
sidered in relation to the other fifteen members, and since it has 
a lower percentage than any of them it adds — 15 to the score &. 
Similarly, the new member with percentage 23 adds 14. Тһе new 


score S is therefore 14 — 15 = — 1 more than the old, i.e. is 24, and 
the new value of v is given by 
24 
= = +018. 
136 


In this way a kind of running total of т can be ascertained without 
the necessity of re-ranking at each stage. 

It may be remarked that in this example the ranks of ranking C 
are obtained from a variate, the percentage stated in the reply. 
Those of the order of receipt are not obtained from a variate, although 
we could, with sufficient patience, measure the time-intervals elapsing 
between the receipt of consecutive replies and hence regard the A- 
ranking as arranged according to a time-scale. 


1.22 We conclude with three further examples of the use of rank 
correlations. 


Ewvample 1.2 


Twelve similar dises are constructed, ranging from flight blue to 
dark blue in colour. Their order is known objectively? by a colori- 
metrie test. 'To test the ability of a dress designer to distinguish 
shades she is shown these dises and asked to arrange them in order. 
The results are as follows : 


Objective order : 1 


2 8 9 10 11 12 
Order assigned by the subject: 1 4 7 


8 
1210 611 9 


ыш 


БҰ Об. 
3 5 8 
We require to measure the subject’s ability to distinguish the different 
shades of blue. 


- Consider now the c 
- Competitors in а be 
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For the value of P we have 


ee OTe TLS ыа уз 


+2 +0 = 52 
„„ x 
66 


The correlation 


feet. We 
is positive and substantial, but far from perfect. 
shall show how 


to test its significance in Chapter 3. e". 
In this example we are measuring the agreement betw A. 
subjective and an objective order. The subjects failure 1 re. 
complete success may be due to genuine inability to Чыйп » the 
shades, to wandering attention or to other causes ; but whatev е б. 
cause we can test the subject's ability against a given objective ог 
Example 1.3 


| er of 
азе where three judges rank a numb 
auty contest as follows : 


Judge А: „ T. 8. 9 
Judge В: 5 4 1 UM 5: 8 3 в= д 
Judge С: DL E 1 S а 7 6 9 45 


There is here no objective order such 
example. We are interested j 
themselves, not in their a 

4 We find for th 


4 revious 
as existed in the ME 
n how far the judges agree ka 
h some objective standard. 

i en pairs of judges 
(A and В) = 0-33 

"(В and С) = 0-44 

"(С and A) = 0:67 
апа С agree тоте th 
en A and В is poor. 


This indicates that 4 


ап 4 and B or А and C. 
The agreement betwe 


Example 1.4 


The following Тађје 1.1 shows, for 
the value of external trade (imports plu 


In appropriate columns we have ran 
these two variates, 


are, on the whole, those of foreign trade; 

but Russia and China form notable except; 

is not very strong. 
It often happens, 
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TABLE 1.1 
TRADE AND POPULATION OF CERTAIN COUNTRIES IN 1938 


Trade 
Rank 
Imports plus Rank = 2 
Country f Export») according | Population according 
£ thousand to Trade (millions) Р io 
million opulation 
United Kingdom. 1-330 1 47-6 6 
ВА ЕИ у 1-024 2 130-0 3 
Denmark: = . 0:142 11 8-8 16 
Erang % , 2 0:450 4 42-0 9 
Germany. . . . 0-882 3 79:2 4 
Greece. . . : 0:045 17 T1 14 
Holland . . . 0-276 n 8-7 12 
CCC 0-232 8 48-4 8 
Japan %“ 5”. 0-309 5 | 72-8 5 
Norway . . . 0-098 14 2-9 17 
обје s 2. 0-105 13 170-4 2 
ВА %5 % 0:052 16 | 25-6 10 
Sweden 2. s 0-201 9 6-3 15 
Argentina 0:180 10 | 18-0 11 
Belgum . . < 0:307 6 | 8:4 13 
Brazil We. s 0-121 12 44-1 7 
China (ex. Man- 

саана). af a 0:085 15 410-0 1 


magnitude differs widely from one individual to another; Norway, 
for example, having a population of 2:9 million against China's 
410 million. In any discussion of relationship based on these variate- 
values we have to be careful that one or two large items do not swamp 
the effect of the smaller ones. By ranking the individuals we do 
something to restore the balance and to give each country a more 
equal voice, as it were, in the discussion. Whether this is a sound 
procedure depends on what we are discussing; but it is worth 
emphasising that there are occasions when the use of variates, though 
in a sense more accurate, may be more misleading than ranks because 
they do not correspond exactly to the relationship which we are really 
trying to measure. 

Тће reader who is familiar with the product-moment correlation 
coefficient of ordinary statistical theory will see the force of these 
comments when we say that the coefficient in the above table between 
trade and population is only + 0-006. The effect of including Russia 
and China in the caleulations has been to reduce the average relation- 
ship to practically zero, the average being heavily weighted by the 
size of the populations of these two countries. 
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CHAPTER 2 


INTRODUCTION TO THE GENERAL THEORY 
OF RANK CORRELATION 


2.1 In this chapter we shall begin the development of a general 
theory of rank correlation and shall demonstrate some results which 
were stated without proof in the previous chapter. The reader who 
is interested in practical applications and is prepared to take those 
results on trust can omit this chapter altogether ; but if he has some 
previous knowledge of the theory of variate-correlation * he may 
profit from a glance through it to see how the various coefficients in 
current use may be linked together within the scope of a single theory. 


The general correlation coefficient 

2.2 Suppose we have a set of n objects which are being con- 
sidered in relation to two properties represented by and у. Number- 
ing the objects from 1 to т for the purposes of identification in any 
Order we please, we may say that they exhibit values а а 
according to а and y, . . . у, according to у. These values may be 
variates or ranks. 

To any pair of individuals, say the ith and the jth, we will allot an 
&-score, denoted by a,j, subject only to the condition that а = — аң. 
Similarly we will allot a y-score, denoted by by, where bjj = — bj. 
Denoting by У summation over all values of i and j from 1 to n, we 
define a generalised correlation coefficient Г by the equation 

yo. ME Dij м М у . (2) 

" VG аў, E by) 


We regard a; as zero if i — j. 


т às a particular case 
2.3 This general definition includes т, p and the product-moment 
correlation r as particular cases which arise when particular methods 


of scoring are adopted. ' 
Suppose we allot а score + 1 if p; > P; (where р, is the rank of 


* For an account of the statistical theory of correlation see G. Udny Yule 
and M. G. Kendall, An Introduction to the Theory of Statistics, 13th edn., 
Chapters 11-16. This book will subsequently be referred to as the Introduction. 

17 
с 
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the ith member according to the a-quality) and —1 if Pi < Ру 
Then 


f EIE Di n . Р . (23) 
ті Pi <р; 

and similarly forb. Thus the sum За bis equal to twice the sum & 

(twice because апу given pair occurs once as (i, j) and once ав (j, ?) 

in the summation). Furthermore Xai, is merely the number of 

terms a,j, that is, n(n — 1), and so for Zb}. It follows from sub- 


stitution in (2.1) that I is equal to the coefficient as we defined it in 
Chapter 1. к ` 


ге as a particular case 


2.4 Instead of the simple +1 let us write 


ТЕ dg =P; — Dy 2. А : · (28) 
and similarly 


b; = q; — % >. x * » bes 
where q; is the rank of the ith member according to the y-quality. 
Both p, and q, range from 1 to 7, and hence the sum of squares 
2 (p, — pj)? and X (q; — %)* are equal. From (2.1) we then have 

г 20р: — Руф — qj) А : . (25) 
Z (p; — py? 
Now à 


Ји n 


A (р: — piq: — а) = Ж 5 Pili + > x 230 
ij- ізі 7-і ізі j=l 
= 2 È (pid; + рж) 
= 2n Ж 044: — > Pi д; 4; қ 


i=l 


n 
cen D> Pati — en 4 ۹ 26) 
1=1 


since Cp, and 2 q; are both e 
numbers, namely in(n + 1). 
We also have 


qual to the Sum of the first n natural 


S(d*) = 5 (p, — а) 


=2 Ep} — 2 E pig. (2.7) 
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and hence, from (2.6), 
2 (pi —pj(d; — 4) = 2n Epi — тп +1)? — nS(d*). (2.8) 


But Хр? is the sum of squares of the first n natural numbers, namely 
n(n + 1)(2n +1), and (2.8) thus reduces on the right to 


inn? — 1) d)) (239) 


Further 
£E(p — р)? = In Ур? — 


and thus, on substituting from (2.9) and (2.10) in (2.5) we get 
6S(d2) 


(2.11) 
n? — т 


Гші- 
so that in this сазе Г is reduced to Spearman’s p. 


Product-moment correlation as a particular case 
2.5 Thirdly, suppose we base our scores on the actual variate- 
values and write 


бу O si OUS > (pul 
b = у; — 9; ( ) 
Then 
3 3) — губи % =n > ар — 2; 27 %. (2:18) 
E 
DS —ay*-nZaj—(Za) . (2.14) 


Now the expression on the right in (2.13) is n times the covariance 
of x and у, and that on the right of (2.14) is n times the variance of w.* 
From (2.1) we then have 
cov (а, y) 
= - . . . (2.15 
у(уат æ var y) (2.15) 
so that Г becomes in this case the ordinary product-moment 
correlation of а and у. 


2.6 It follows from the preceding paragraph that p itself may ђе 
regarded as a product-moment correlation between ranks considered 
as variates. We will verify this directly. 


ж For the definition of these terms see 3.10. 
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For a set of values which are the first n integers we have, as in the 
previous section, 


Хр; = n(n +1) 
and hence the first moment (the mean) is given by 
щ = Қа + Т). 5 x . (2.16) 
Similarly 
2р? = n(n + 1) n + 1) 
and hence the variance is given by 


H 2 , 
Ha pi — шу? 
т 


=з 1) 2.17) 
From (2.7) and (2.10) we find 


gu) = туби 0 гра —1 gi rg] 
2 12 n т т 


so that the first product-moment, which is the expression in curly 
brackets on the right, is given by 


1 1 
= —(n? — 1) — S (de). 
pum А ) се) 
Thus the product-moment correlation is 
Ил _ esda) _ 


VA ШШШ nn 


possible апа assigns 
dividuals are in the 
e and gives greater 
if they are further apart 
(i.e. separated by more intervening members of the ranking). The 

value to the difference by 
measuring it on the variate scale, if one exists. Тһе choice between 


these methods, or a choice of other possible methods, depends, on 
practical considerations. 


2.8 We shall make consi 


derable use of the ађоуе approach in 
later chapters, particularly i 


n connection with sampling problems. 
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At this point we leave it temporarily to prove two assertions made 
in Chapter 1, viz. 

(a)—in section 1.13—that the minimum number of interchanges 
between neighbours required to transform one ranking into 
another is simply related to the score $; and 

(b)—in section 1.18—that the values of p obtained by correlating 
a ranking 4 with a ranking В and its conjugate B' are equal 
in magnitude but opposite in sign. 


2.9 We will prove the second one first. It will be sufficient if we 
consider a ranking А typified by p; correlated with the natural order 
—1,...nand its inverse В” =", . . . I. Denoting the values 
of S(d?) for the second by S'(d2) we have 
5(4°) = Z(p,— iy = Zp? + 28 —2Zip, 
= n(n + 1)(2n + 1) — 2 X ip; 
Sa) = Z {р ( +1 — 9) 
= n(n + 1)(2n +1) — 2 X (n +1 — i)p; 


Непсе 
S(d*) + S'(d?) = $n(n + 1)(2% +1) — 2 Z (n + 1)p; 
= 3n(n + 1)(2n + 1) — n(n + 1)? 
= im — n) 
Thus 
РО = ) + S"(d?)} 
= 0 


which establishes the result required. 
2.10 To show that the number of interchanges s is given by 
s = п — 1) — 1S 1 5 . (2.18) 
we shall first prove that s is not greater than the value on the right 
in (2.18) and then that s cannot be less than that value. The equality 
will follow. 
Define a unit function ) 
my =1 if р> р; 
=0 if р; <р; 
Тће object with rank 1 may be transferred to the first place (on the 
extreme left) by p, — 1 interchanges. This will move the object with 
rank 2 to the right by та, places. To transfer this to the second 
place will then require ps — 2 + т; interchanges. Similarly the 
ith object will require 
P. 1 ＋ mit т |... "mee 


(2.19) 
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interchanges. Adding all these together we find for the total number 
of interchanges 


7, 


onm — di + v ту = 3 ту . · (2.20) 
ісі ігі > 


ї= i<j i<j 


where > denotes summation over values for which i<j. Now 
izj 
we may write 
D= у (1 — 2m,,), А А . (2.21) 
i<j 
for each unit contribution to S (counting only those pairs for which 


i <j so as not to count everything twice) is + 1 if Pi 
in the contrary case. 


(2.20) is equal to 


#20) — 49 = р» — 1) — 38, 


i<j 


« p; and — 1 
'Thus the number of interchanges given by 


since the summation of unity over values for which i < j is half the 
number of members, which is 4n(n — 1. Hence, if sis the minimum 
number of interchanges, 


s < n(n — 1) — 4S (2.22) 
Let T be a sequence of interchanges which reduces a iven 
arrangement to the natural order. We classify the interac. es 
composing T into n groups Ту, T's, · Ta Т, consists of 11055 
which involve the object 1, Т, of those which involve 2 bui ЕЛ 
and so оп, Т, being an empty set. 4 VPN 
Tn any group T; let 4; be the number of i 5 whi 
the object to the left and B; the number VN Ee RE iy a. 
The number of interchanges required in T, will be at 1 m DE 
These interchanges will move object 2 тіз places to th ES nea T 
nett result of operations іп T', will be to move the ites Ў, . 
places to the right, and thus the total movement of ES ыз 
ma + А, — В, 


and this must equal 2 — p,. Hence 


B,—4,—p,—2 RED 
and hence E 
Bs + А, >р, 2 d mas 
In a similar way we have 
Bit А, > р, i +т, +... + т, 


4-1.% 
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Adding such inequalities for i —1,...m—1 and remembering 
that 
Da — 1 + min + + + TT. =0 


5 - S a, +в) 


ігі 
> Хр Zi + ту 
i<j 
> щт — 1) — 35. А i . (2.28) 
and thus from (2.22) and (2.23) we establish (2.18). 


we have 


Spearman's footrule 

2.11 To conclude this chapter we will consider briefly а 
coefficient based, not on S(d?) like p, but on S | d | where | d | stands 
for the absolute value of d, i.e. its value without regard.to sign. 
This coefficient, sometimes known as Spearman’s “ footrule ", is not 
of the general type of (2.1). Let us put 
_ 8514] 

ni—l 
If two rankings are identical, 5 | d = o and R =1 as we should 
require. 

Now if one ranking is the inverse of the other, suppose that n is 
odd and equals 2m + 1. Then, as in 1.15, 

S |d| = 2 (2m + (2m — 2) + (2m — 4) +... +442} 
= 2m(m + 1) 


- R-1 SEES (2:24) 


Hence 
6m(m + 1) 


8 (2m + 1)2 — 1 
0 о + (0298) 


If n is even, say 2m, we find similarly 


R= 


8|4| = 2m, 
and hence 
* 6m? 
Am? I 
== os = = 2250 
n» 


Thus the coefficient R cannot have a minimum value — 1 unless 
п = 2. For large even т it rapidly approaches — 0:5 and for odd % 
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it must be — 0-5 in all савез. 
entirely remedied by taking 
for S |d] in (2.24). 

ТЕ we were to write 


This is a. defect which cannot be 
à different multiplier from 3/(n? — 1) 


үү 38|d| 
n? 
and — 1 + 2/n? for odd m. 


В” would become — 1 for even n 


2.12 Moreover, R is much 1 


example, we write down the 24 p 
them with the 


LA 1 Т 
ess sensitive than т or p. If, fo 


crmutations of 1 to 4 and correlate 
natural order we find : 


Values of p Values of R Frequencies 
1-0 1:0 1 
0:8 0-6 3 
0-6 0-2 1 
0-4 оз) 4 
0-2 0-2 2 
0-0 = 0:2 2 

— 02 — 0-2 2 
- 04 - 0.2 4 
- 06 - 02 1 
- 0:8 - 0-6 3 
=10 - 00 1 

24, 


For the same value of R, 6-8. 0-2, р may vary from 0-2 to 0-6; and 
for R = — 0-2 p may vary from 0-0 to — 0-6. 


into account with the analytical difficulties 
ie sampling distribution of R, we may safely ignore 
it in favour of either т or р. 
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nged then р increases. 
f + 1 as the іп, 
more concordant—an obviously useful and desirab] 


CHAPTER 3 
TIED RANKS 


3.1 In practical applications of ranking methods there sometimes 
arise cases in which two or more individuals are so similar that no 
preference can be expressed between them. When an observer is 
ranking members by subjective judgments this effect may be due 
either to a genuine indistinguishability of the objects or to failure by 
the observer to distinguish such differences as exist. The members 
are then said to be tied. The arrangement of students in order of 
merit or by reference to examination marks is a familiar source of 
ties of this kind. 


3.2 The method which we shall adopt of allocating rank-numbers 
to tied individuals is to average the ranks which they would possess 
if they were distinguishable. For instance, if the observer ties the 
third and fourth members each is allotted the number 8%, and if he 
ties the second to the seventh inclusive, each is allotted the number 
H2 ＋ 3 + 4 + 5 + 6 +7) = 44. This is sometimes known as the 
“mid-rank method". When there is nothing to choose between 
individuals, we must clearly rank them all alike if we rank them at 
all; and our method has the advantage that the sum of the ranks 
for, all members remains the same as for an untied ranking. 


3.3 We have now to consider the effect of ties on the calculation 


of т and р. 

In 1.9 we saw that a score of + 1 or — 1 was allotted to a pair of 
members according as their ranks were in the right order ог not. ЈЕ 
they are tied we shall allot the score zero, midway between the two 
values which they might assume if they were not tied. The score 9 


is then easily calculated. 


3.4 А new point arises, however, in regard to the calculation of 
the denominator by which S is to be divided to obtain т. We have 
two possibilities : 

(ay We amus the denominator (т — 1) as for the untied form 


of т; 
(b) We can replace ут — "es Зу {2 aj, E 5 }, where a; is the 
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score of the ith and jth members in one ranking and b 
corresponding score in the other. 5 
Where no ties exist, any term а? is unity, so that X aj; reduces to 
the number of possible terms, namely n(n — 1); similarly for X bi 
so that the expression 4y {ZX aj, ХЫ) reduces to in(n — 1), as it 
should. The reason for adopting this expression in the tied case will 
be clear from 2.2 and 2.3. 
If there is a tie of t consecutive members 
any pair chosen from them is zero. 


Consequently the sum X а? is n( 


iz 15 the 


all the scores arising from 

There are Ц4- 1) such pairs. 

* — 1) — 211 — 1), where X stands 
t 

for the summation over various sets of ties; for n(n — 1 

which would be obtained if no ties w 


reduced in virtue of the zero scores ar 
fore, we write 


t 
) is the sum 
cre present, and this is to be 
ising from the ties, If, there- 


T — 4Zi(t — 1). 
t 
for ties in one ranking and 


U = }Z uu — 1) 


(8.1) 


(8.2) 
for ties in the other, our alternative form of the coefficient т for tied 


ranks may be written 


8 


т = ج‎ 3 (8.8) 
У ут — 1) — TY (3n(n — 1—U) 

Before discussing the altern 

arithmetical example. 


ative forms further, let us consider an 


Example 3.1 


Two rankings are given as follows: 


A PS EM а B. c MI 254 т 9] 
B li 2 c peris ns a! 8 My 107 
Except for ties, both rankings are in thes 


Exc : ame order and the correlation 
is high. Consi 1 association with the other 9, 
; the second and third members 
whatever the 
ated with the 
A The full score is 

1 8 +0 = 88. 
(а) of 3.4 the value of is then 

33 


Tg 8-6 = 0:733. 
ae 45 t 


If we adopt alternative given by 
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Under alternative (b) we have, for the 4-ranking, 
Zał = 45 — № x 1) — № x 1) — H2 x 1) > Қ2х1) 
= 41 
and for the B-ranking 
Eb = 45 — (4 x 8) — (8 X 2) 


= 36. 
Hence 
33 
To = — oo 
VAI x 86) 
= + 0:859. 


The value given by alternative (b) must, of course, be greater than 
that of (a) in all cases. In the present instance it is substantially 
greater. 


3.5 From the general point of view developed in Chapter 2, the 
appropriate form of coefficient, as a true measure of correlation 
between two sets of numbers, is т. For example, if we are measuring 
the agreement between two judges in arranging a set of candidates in 
order of merit (no objective order necessarily existing) we should use 
t Both judges may be wrong in relation to some objective order, 
and they may disagree with other judges, but that is not the point. 
We are measuring their agreement, not their accuracy. 

Suppose, for instance, that both rankings are the same, that the 
last member of each is n, and that all the others are tied and hence 
have rank jm. Then t = 1, as it should be to express complete 
agreement between the rankings. But we also have, from (3.1) and 
(3.2), 

T =U = 4n — (n — 2) 
and hence 
Шша-1)-Т-т-і. 
Thus, since the score 5 is also т — 1 (confirming that т, = 1), we have 


nu 2 
T, = == = = . - . (8.4) 

4 4dn(n—1) т 
and for large т this is nearly zero. Clearly т, is an inappropriate 


measure of agreement. 


3.6 Nevertheless, there may be cases where та is a better 
measure than т,, Suppose that there really exists an objective order. 
The purpose of correlating a ranking assigned by an observer is then 


28 RANK CORRELATION METHODS 


, 


produce ties 
because there really is an objective order. In such а ease it may be 


argued that the full divisor $n(n — 1) should be used in calculating 
т, ie. that т, is the appropriate form. 


Consider the case where our ranking is the natural order 1, . . . n 


and the other has the first (n — 1) members tied (with ranks each 1n) 
and the last member ranked as n. As in (3.4) we have 


whereas 


= вена = Ир =! = 2 .5 

ть У nn - 1)5) E E а (8. 5) 

For example, with n — 9, Ta = 0-22 and т, 
seems nearer to what we sh, 
with an objective order, 
wrong order, and has ra 
unable to distinguish between + 
а measure of his ability seems 


tas due to inability 
*: what is the average 


value of т over all the 1 possible w. assigning integral ranks * 


to the tied members ? 

If we replace any tied set t by i 
!! possible orders we get the same result 
for the tied members by zero; fori 


ays of 


he scores а 

in the 71 arran — 5 

: 3 Sements А 
will occur an equal number of times in the order X eye pair 
order Y X, so that the allocation o in the 


4 а Опе са; чај 
in the other is equivalent to allocating zero on the зе and — 1 


*The symbol t! (read “ factorial 1”) stands for the number 
IEE EE * (t =) xr. 


It is the number of different Ways of arranging ; objects j 
instance, four objects A, В, C, D can be arranged j In order, 


п 24 differ, For 


ent orders, 
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we may regard т„ as an average coefficient, such as would be obtained 
if the tied ranks were replaced by integral ranks in all possible ways, 
t calculated for each, and an arithmetic mean taken of the resulting 
values. 


3.8 We turn to consider the analogous problems for the rank 
correlation coefficient p. Again we shall have the choice of two 
denominators and two coefficients, which we may denote by p, and py. 
If there are sets of ties in the two rankings typified by 2 and u we 
define 

а ts —1 
ns tg! | 44025 2 (ue 


U' = 4, X (us — и) 
Then we have . 
1E 
n? —n 
y(n? — п) — S(d*) — T' — U. ( 
= 8 А . (8.8) 
P» = fun? = n) — 27) ens — п) = 207 

We shall prove these formulae below, but before doing so we will 
‘consider an example. 


Pa 1 


Example 3.2 
Consider again the two rankings of Example 3.1— 


Hoe Sep Эр ар. ( 
pat NM atu ar p ак 44 8 8 MERO 
In the first ranking there are four tied pairs (t = 2) and hence 
Т' = +(23 — 2) = 2. 
In the second there is one set for which ¢ = 4 and one for which t = 8, 


so that 
U' = (43 — 4 - 33 — 8) 


EM 
We also find 
(а) = 18. 
Hence, from (8.7), 
бәле Oe ted) 


990 
= 0:867 
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and from (3.8) 
165 — 22 
Ры = == === 
/(161 x 151) 
= 0-917. 


3.9 It is useful to note that (3.8) can be put in the form 
aln? — n) — S(de) — T" — т” 
рр а - 


c e 


en) = + 


i] 
(3.9 


"e uu .9) 
Thus, if Т” and U' are small compared with (n? — n), we have 
approximately m төл ae 
(d?) s 
-1- А . (8.10 
Be Un? — п) — (T E 07) (810) 
or, slightly more approximately, K 
(de) 
Ру =1 — 4 2 ІЛ 
b 70 — 7) « (84 ) 


which is the ordinary formula for p in the untied case. We therefore 
expect that when the ties are not very numerous the use of the 
formulae (3.9) or (3.10) will make little difference to the numerical 
values given by the use of (3.11). 


For instance, in the data of Example 3.2 we find for the form 


(3.10) 
13 
Pp 1 — 
d 165 — 5 
= 0.9167 
and for the form (3.11) 
13 
= ee 9212 
Po 165 0:9212, 


The value given by (3.8) is 0-9171. 


é All three values 
nearest second place of decimals. 


agree to the 


3.10 То establish formulae (3.7) 
of the results of Chapter 2. We saw ; 2.6 m > 
as the product-moment correlation between the Minis” be regarded 
adopt the same viewpoint when some of t ` Suppose we 


he ranks З 
For а set of untied ranks the sum of squares ко 
> 15 
Әрі = фп + 1)(2n + 1) 


that р 
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where p; is the rank of the ith individual ; and the sum of ranks is 
Хр; = уп + 1). 
If a set of 1 ranks are tied the sum of ranks remains the same but the 
sum of squares is altered. Suppose the ranks рь + l, . . py +t 
are tied. Then the sum of squares is reduced by 
(py +1)? + (Pe +2)? .. . + (рь + 0° — 2% + EY 


= ip? + 2p,(1+2+... +) 4 1 EEG +2 
— {рї + pit +1) + E+ 09) 


It is convenient to introduce at this point (for the reader who has 
omitted Chapter 2) the idea of variance. This quantity is defined as 
the mean of the squares of deviations of a set of values from their 
mean. It is the square of the standard deviation. Thus for an untied 
ranking the variance is 


1 1 2 1 
= 5 (р,; — 1 1) pi — Қа + 1): = 1 y) 
сэр Het DY = Эш e п" 


the summation extending over the n ranks. It follows that for a tied 
ranking the variance is 
1 1 
A(n — 1) – 2 (is – 1 
ix ) 12% ( ) 
— Jam —n) — 97”) р : . (8:12) 
2n 


If two quantities а and y are measured from their mean we may 


] А 1 
similarly define their covariance соу (а, y) as the function - 2 (ay). 
n 
Since 
а + уз — дау'= ( — у)? 


we have 
var а + var y — 2 cov (а, у) = var (æ — у). 


The correlation p, may be defined as 
cov (р, 9) 
Var p var 4) 


varp t+varg—var(p—9@) , | (818) 


i 


1 
Eg \/(var р var 4) 


It is easily verified that this gives p, for the untied case. Let us apply 
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it to the tied case, Since var (p -- q) is still equal to 1а) we find 
n 
from (8.18), on using (8.19), 
pp = 100 — n) — 27" + Mn — п) — 20" 25(4%) 
КО Уп) TT 200 | 


which reduces to (8.8). The formula of (3.9) for p, follows in a similar 
хау. 


3.11 Whath 
in which z, and T may be preferred 


nkings, for the 
and thus the result follows. 
Example 3.3 . : 
If two rankings are identical, the last i i 
‘ical, member in ег у , 
and the other n — 1 are tied with rank in, Pond Mn : 


But for p, we find 


and S(d?) = 0. 
Thus, from (3.7), 


Un + 1)(n)(n — Ty H 
xU n(n — 1)(; 

Ра — e 
; 70 + 900% — Fp 

ami] 
п ＋ 1 

We find the same kind of difference between t 

between the two types of т in 3.5, he two types of p as 


3.12 А fairly common problem in Psychology ve 
relationship between two qualities, one of whic 576915 measure the 
and the other a dichotomy or classification into two el У а ranking 

2568 according. 
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as the individual possesses a certain attribute or not. Consider the 
following ranking of 15 girls and boys according to merit in an 
examination : 
Rank: 12545 8|7 ⏑⏑‚ A 
Sex: RB В.С B а св B BI CGB Ge BCC 
We are interested here in whether there is any connection between sex 
and success—whether the boys did better than the girls on the average 
or vice versa. Т 

We will imagine that the division into sex is itself a ranking. 
There are 8 boys and 7 girls and we will suppose that the first 
8 members of the ranking by sex are tied, and so for the next group 
of 7. The actual values of the tied ranks will be 44 in the first case 
and 12 in the second, so that the pair of rankings may be written— 


Ranking 4: 1 2 34 5 67 8 9 10 11 12 18 14 15 
Ranking B: 4} 4% 12 4} 12 12 4} 4} 4} 12 4} 12 4} 12 12 


We may now calculate т, for these rankings. We find 
8-ңҢ7--7-646-5-54-4-4-4-9-48-1-42-0 


= 18 
T=0 
U = M8 x 7) + Қт x 6) = 49 


Жеті UNT E 
V(105 x 56) 

This indicates some positive correlation between order of success and 
the order of sex, which we have chosen by putting boys first. (Had 
we put girls first, of course, we should have obtained т = — 0:24 
leading to the same conclusion.) In short, the boys seem to be better 
than the girls. Whether the evidence is sufficiently indicative of 
* rea] ” correlation is a matter of significance which we shall diseuss 


in the next chapter. 


+ 0:24 


Example 3.4 

A number of workers in a factory were interviewed and an assess- 
ment made of their adaptation to living conditions. They were 
assessed as “ efficient " or “ overactive ". Statements by the men 
were also available concerning the frequency of nocturia. For men 
aged 50-59 years the following was observed : 

Ranking according to frequency of nocturia (least nocturia given 


highest rank)— 


“ Efficient ” : 23 21 2} 2} 03 61 10-10 10 10 14 14 
“ Overactive ” : 5 10 14 16 17 
D 
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ES ive grou 

A cursory inspection of the figures indicates that the vere 
had the greatest frequency of nocturia. Let us measure the re э 
ship with ту. We find, writing E and О for efficient and overa 
respectively— 17 
Ranking A: 21 21 23 23 5 63 6} 10 10 10 10 10 14 14 6^ 1 0 
4413 5 30 10 10 14 14 i 

8 P is 1S 
Tf we wish we can replace the E’s and 0% by rankings, but thi 
unnecessary for the calculation of SS. We find 


S=5+54545 _¢ 


t4+4-2484343434242 
= 84. 
eii Eo) 46 0 4 4 xa) 

= 20. 


U= 4302 x 11) + 363 x 4) 
= 76. 


Thus 
34 


3.13 Ifa ranking consists о 
members in the two classes 


inn — 1) — T = фа — 1) 
= ay. 


OE SD ai s.p Sg 


= 1 — 1) 


= Хо — x(n — 2 — 1) 
Thus we have 


1 = S А 
Тау (3n(n. — |) 


8.14) 

Uy ( 
3.14 Consider now the extreme case when both . ; 

of a dichotomy, one, say, into 2 and n — саа rankings consist 


=y members, the other 
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into p and n — р = q members. We then have for the denominator 
in ть the simple expression У(аурд). We may express the data in 
what is usually known as a 2 x 2 table, thus : 


First Quality 


Possessing | Not-possessing Totals 
Possessing a b p 
Second 
Quality | Not-possessing c d q 
a y n 
(8.15) 


Апу member of the class possessing both qualities (a in number) taken 
with any member of the class not possessing either (4 in number) 
gives a pair with the same order in either ranking and hence con- 
tributes +1 to S. Similarly any member of the b-class with any 
member of the c-class contributes — 1. The others contribute 

nothing. Hence 
S — ad — bc 

and 

22 ad — be 
ae 
Vu pd) 
This is one useful form of a coefficient measuring association in 
а 2 x 2 table. There are other coefficients of the same kind, but it 
is interesting that for the extreme case when both rankings are so tied 


as to be dichotomous the coefficient t becomes one of the measures of 
association which have been developed in other connections.* 


. (8.16) 


Example 3.5 
Reverting to the data of Example 3.4, suppose that it was possible 


* See, for instance, M. G. Kendall, The Advanced Theory of Statistics, 
Vol. 1, Chapter 13. If y? is the square contingency for the 2 x 2 table, 
calculated in the usual way, т; for the table is equal to y?/n. 
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В : in, 
only to grade nocturia into normal and excessive with the following 


results : 
Nocturia 
Normal Excessive Totals 
Assessment | Efficient SE 10 2 12 
Overactive . ` 2 3 % 
ТОТА И e 12 5 17 
We then have | 
30 — 4 
Ty = 
Ма? x 5 x 12 x 5) 


= + 0:43 
. Which is in good agreement With the value of 0-41 found in 
Example 3.4. 
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CHAPTER 4 
TESTS OF SIGNIFICANCE 


4.1 The rankings with which we deal in practice are usually 
based on a set of individuals which themselves are only samples from 
a much larger population. It is of some interest to be able to measure 
the relationship between mathematical and musical ability in a given 
class of children ; but it is of much greater interest to be able to say, 
if this class is chosen at random from a certain population of children, 
how far the results for the sample throw light on the relationship in 
that population. In this chapter we shall consider the question : 
given a value of a rank correlation in а sample, how far can we con- 
clude that there exists correlation in the population from which the 
sample was chosen? In short, we shall try to test the significance 


of ‘observed rank correlations in the special sense of the statistical 
theory of sampling. 


4,2 Suppose that in the present population there is no relation- 
ship between the two qualities under consideration. Then if a sample 
is chosen at random, any order for the quality A is just as likely to 
appear with a given order for B as any other A-order. If we choose 
some arbitrary order for B (it does not matter which, so we will take 
the natural order 1, . . . т), then all the n! possible rankings of the 
numbers 1 to n for A are equally probable. Each accordingly has the 
probability 1/n! (For the present we confine ourselves to the untied 
case.) 
Now to each of the possible arrangements of the A-ranking there 
will correspond a value of т ог р. The totality of such values, n! in 
number, may be classified according to the actual value of т or p, 
ranging from — 1 to + 1, in what is called a frequency-distribution. 
This distribution is fundamental to our present investigation and we 
proceed to consider it in some detail. 


4.3 For a ranking of four there are 24 possible arrangements. 
Tf the reader writes them down and correlates each with the natural 
2, 8, 4 he will find the following values of 9: 


37 


order 1, 
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Frequency of Rankings 


with the Assigned Value 
Value of 8 of S 
= 6 1 
A 3 
9 6 
E 5 
4 3 
б 1 
Toran: 94 


is 6, attained when 
al about this value, and 95 5 
frequencies fall away to unity. 

44 The с 
show only zero 
by symmetry) 


H H H H 8. 7 ye 
orresponding distribution for n.— 8 is as follows (W 
Ог positive values of 


S, the negative values being given 


Frequency of Rankings 


Values of 5 with the Assigned Value 
9 3,826 
2 8,786 
8 3,450 
8 3,017 

10 2,498 
12 1,940 
14 1,415 
16 961 
18 602 
20 348 
22 174 
24 76 
26 27 
28 7 


Toray (of whole distribution) ; 
In Fig. 4.1 we have 


а5 abscissa. 
in the frequency as 5 increases, 
4.5 In the next ch 
distributions for various 
without proof— 


(a) that the distributions are always sym 


ios pm See how to derive these 
s *. For the Present we state 


Metrica], If n(n — 1) 
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о 


N 


Frequency (thousands) 


-10 0 10 20 28 
Values of 5 
Fic. 4.1. 


is even, S can take only even values and there is a maximum 
frequency for S = 0. If 4n(n — 1) is odd, S can take only 
odd values and there is a pair of maximum frequencies at 
S=+1; 

(b) that the frequencies fall away steadily from the maximum to 
the value of unity for S = + n(n — 1); 

(c) that as n increases, the shape of the frequency polygon tends 
to that of the normal curve 

1 m 
а) = ——е 2% E . 5 ал 
fle) = аа (4.1) 

and that for n greater than 10 this curve provides a satis- 


factory approximation to the polygon. The parameter о of 
the curve, which is equal to its standard deviation, is given 


by 
o? = 4gn(n — 1)(2n + n a 7 „ 44.2) 
If the normal curve with this standard deviation were drawn in 
Fig. 4.1 it would fall so close to the polygon as to be barely dis- 


tinguishable on this scale. 


4.6 The tendency of the distribution of frequencies to normality 
is a very useful property which enables us to avoid the calculation of 
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the actual distribution for n > 10. lor » «10 the distributions 
have been worked out and form the b. 
page 141. "This table shows t 


4.7 Let us now consi 


der the use of these distributions in testing 
the significance of 7. 


A test of v is equivalent to a test of the 
corresponding value of S, one being a multiple of the other, and we 


shall find it arithmetically more convenient to deal with S. 
If there is no connection between the two qualities, a pair of 
rankings chosen at random will give some value of S lying between 


the limits + n(n — 1). The greater part of such values will cluster 
‚ Tound the value zero. We shall аа 


с opt the following criterion : if the 
observed value of & is such that it is very improbable that such a value 


ve arisen by chance, we shall reject 
es аге independent. This amounts 

ue of S lies in the “ taile” f the 
distribution of ıS away from its та 


го а matter of соп i ll 
we shall regard a probability of 0-05 ог one of WON ipud Si 
of 0-001 as very small We someti адаа 


robability level" of S, meani ара г 

or exceeded with probability 0-05, and si ae is 8 
probability level" or a 04 per cent Probability leye] P. The 
corresponding values of 5 may ђе termed, for example T Ў еуе . " 
significance point”. To say that an observed g А Ж 5 рег К 
5 per cent significance point means that the Probabilit P. оца {ре 
at such a value or greater (in absolute value) 1 15 ity о parang 
If we suspect beforehand that there is Positive pod than 0-05. 

prefer to consider the probability that 5 lies in the ite ation we may 
and similarly, if negative correlation is expecte а Каре tail only ; 
attention to the lower tail. This would amount + may confine 
probabilities that & attained or exceeded some aie Хи ушып 
short of some value, as the сазе might be, instea, 4204 а. ue or fell 
exceeding some figure regardless of sign, attaining ог 


fa “5 рег cent 


таму 


Example 4.1 


In a random sample giving a ranking of 10 we fi 
equal to — 0-11. Is such a value significant? ada value of т 
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The corresponding value of S was — 5. From Appendix Table 1 
we sce that the proportion of rankings which give a value of — 5 
or less and + 5 or more is twice 0-364, namely 0:728. This is quite 
large and we need not reject the hypothesis of independence of the 
two qualities. In other words, the observed value is not significant. 
It could well have arisen by chance from a population in which 
mathematical and musical ability are unrelated. 

Had the observed value of т been + 0-56, corresponding to a value 
of S = 25, we find for the probability that | 8 | > 25 (is greater than 
or equal to 25 in absolute value) 0:028. This is very small, and we 
should have concluded that the abilities were not independent in the 
parent population. 

We have based our inference on absolute values of S, and this. 
appears to be the best course, particularly when the sampling distri- 
bution is symmetrical ; but if necessary we can draw inferences based 
on the actual values. If the probability that S is greater than 
some value S, in absolute value is Po, say 


Prob {| S| > So} = Po 


then the probability that 5 > So, where So is positive, is PD, and 
so is the probability that S < $ where ıS, is negative. For instance, 
with S, = 25, 

Prob (S > So} = 0:014, 


and with 8 = — 5 

Prob {5 < — 5} = 0:864. 
In using probability tables the reader should remember which type 
is being considered. А 5 per cent point where absolute values are 
employed is only a 2:5 per cent point where actual values are 


concerned. 


Correction for continuity 

4.8 Where n is greater than 10 we shall use the tables of areas 
under the normal curve as an approximation to the exact values 
based on the distribution of S. The appropriate areas are given in 
Appendix Table 3 and show the probability that а given multiple 
of the standard deviation (not in absolute value) will be attained or 
exceeded. It is a useful rule to remember that for the normal 
distribution the probability is 0-05 that a value will exceed 1-96 times 
the standard deviation in absolute value, 0:01 that it will exceed 
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2-58 times the s.d. in absolute value, and 0-001 that it will exceed 
8-8 times the s.d. in absolute value.* 

In using this table we have to remember that we are replacing 
the distribution of S, which is discontinuous and possesses frequencies 
separated from one another by two units, by a continuous distribu- 
tion. To make the approximation better we shall regard the 
frequency f at S, instead of being concentrated at S, as spread 
uniformly over a range from S —1 to S +1. To compare with 
areas of the normal eurve we shall regard the tail as ** beginning "' 
at S — 1, thatis to say, we shall subtract unity from the observed 9 
if it is positive (and add unity if it is negative) before expressing 
it as a multiple of the standard error. This is known as a “ correction 


for continuity’. The following example will make the procedure 
clear. 


Example 4.2 


In a pair of rankin 
accordingly т = 0:31. 
From (4.2) we find 


gs of 20 the value of 5 observed is 58 and 
Is this significant ? 


a? = var S = 2.(90 x 19 X 45) — 950 
о = 80:82. 
For S corrected for continuity we have the value 57, and thus 
S (corrected) — 97. 
30:82 
From Appendix Table 3 we see that the probability of a deviation 
less than 1:850 is about 0:9678. The probability that 1-85 is obtained 
or exceeded in absolute value is th 


us about 2 (1 — 0-9 — 0-064. 
This is small, but not very small ( 678) = 0-0 


„ We suspect that th d 
é 2 : егуе 
value of т is significant but cannot reach а very definite е 


Let us compare а value given by the normal approximation when 


т = 9. Suppose the observed & is 20, correspondin Eo 
From (4.2) we have Б to т = 0:56. 


б = 1:850. 


о = Mis x 8 х 28) 
= 9:599. 


rag 


* The standard deviation of a distribution of sample val 8 
distribution) is called the “standard error", Its ues (1.6. a sa 


Т mpling 
" Squa 

* sampling variance", or simply the “ variance 23) GR тА оох вв the 
Thus var S = o? = 4!5n(n — 1)(2n + 5). Written „var > 
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With a correction for continuity 
8 
с 9-592 
The probability that this will be attained or exceeded in absolute 
value is seen, from Appendix Table 3, to be about 0:048. Тһе exact 


value, from Appendix Table 1, is 0-044. Had we made no correction 
for continuity we should have found a value of 0:037. 


== 951. 


4.9 When ties are present the above formula (4.2) for the 
standard error of 5 requires some modification. If there are ties of 
extent t in one ranking and и in the other, then the variance of the 
distribution obtained by correlating one ranking with all n! possible 
arrangements of the other is given by 

var S = jg {n(n — 1)(2n + 5) — Z i(t — 1)(2t 4- 5) 

t 
— Sulu — 1)(2и + 5)) 


I it — 1-2) «wu — Du —2)) 
ui 9n(n — 1)(% — з) u 
1 
I e At — 1) }{2u(u —1)}* 
PRISES ( не ці 
ТЕ only one ranking contains ties, 
var S = Ye (n(n DOR + 5) — Zt — 1)(2t + 5)} 


(4.8) 


so that all us are zero, 
(4.4) 


We shall prove these results in the next chapter. 

4.10 As for the untied case, the distribution of т for any fixed 
number of ties tends to normality as n increases, and there is probably 
little important error involved in using the normal approximation for 
n > 10, unless the ties are very extensive or very numerous, in which 
сазе a special investigation may be necessary. For the case n < 10 
no complete tables are available owing to the large number of 
possibilities. The distributions have, however, been tabulated by 
Sillitto (1947) for any number of tied pairs or tied triplets up to and 
including 2 = 10. 


Example 4.3 
Consider the two rankings of 12: 


% ПРИ Mr eed Meee А ыты 7-0 
B К 1, а ШЕ 8i 8} 10 


"A 
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We find 
© 8--841-45-45--5--5 8-29-41-1-84 


In the first ranking there are ties of extent 2, 2, 3 and in the second 
of extent 2, 2, 2, 9. From (4.3) we then have— 


var S = 14012 x 11 x 29) — 6(2 x 1 x 9) — (3 x 2 x11) +0 


1 
=== {2(2 x 1) + (8 x 2)}{4(2 x1) 
2 x12 x п! ( Бы н 
= 208-30. 
Thus with а correction for continuity 
= 088 = 2:810. 
4/208-80 


The chance of attaining ог exceeding such 
magnitude is about 0-021. This is small, a 
significance to the value of S. 


a value in absolute 
nd we incline to attribute 


4.11 If one ranking degenerates to a dicho 
with а and n — æ = у members and ties in the ot 
by 1, we find, on substitution in (4.3), the eq 


tomy, as in 3.13, 
her ranking typified 
uation 
= ay = 

var S = Fun т — zat —1)) TAA) 

Finally, if both rankings become dichotomies, as in 3.14, we find 
on substitution in (4.5) у 

S — гура 

F 2:2. 40 


4.12 These equations provide us with tests of significance of the 
appropriate т, or т, coefficients, or rather of the values of 8 from 
which they are derived. There is, however, one difficulty in regard 
to corrections for continuity— culty in regar 

(a) In the case of a dichotomy and an untied rankin the interval 

between successive values of & 18 two. The Ў е ан 22% 
deduction from & for continuity is one-half E Бурла е 
namely unity. of the value, 
(b) For a dichotomy and a ranking com 
same extent ¢ the interval is 2¢ an 
for continuity is 1. 
hen both variates are dichotomies the ; Sau 

ү Paschen for continuity is 4N. he interval is N and the 


Posed ent; 


ire], i 
d the appr y of ties of the 


Opriate deduction 


ж 
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(4) If one variate is dichotomised and the other contains ties of 
varying extents, there will be varying differences of interval 
between successive values of & in various parts of the range. 
In such a case we may use an approximative method, as in 
Example 4.5 below. 


Example 4.4 
In Example 3.5 we found a value 
Ty = 0:48 
based on & = 26. For the variance we have, from (4.6), 
12 Х5 * 12 х5 
16 


сё = var S = 


с = 15. 
The correction for continuity is 17/2 and thus we have 
8-85 _ 4167. 
15 
The probability that this is attained or exceeded in absolute value is 
0-24, and the value of ту is not significant. 


Example 4.5 
For the same data but with one variate not dichotomised we have 
found, in Example 3.4, t = 0-41, based on 8 = 84. For the variance 


we have, from (4.5), 
12 xB ys —17 — (48 — 4) — (2 — 2) 
8 x17 x 16 
— (538 — 5) — (8* —8)} 


с? = var S = 


= 844-6 


о = 18:56. 

Now for the continuity correction we note that the A-ranking 
“jumps from 22 to 5 and this involves an interval of 5 in the 
S score; forif we replace one of the E's corresponding to an A-rank 
by the O corresponding to the A-rank of 5, S is reduced by 5, the first 
five scores being 4 + 4 + 4 — 9 + 4 instead of 5 + 5 +5 + 5 — 8. 
Similarly the “ jump " from 5 to 64 gives an interval of 3 and so on. 
We can estimate the mean interval without calculating each individual 
interval in this way. The total of the S-score intervals is twice 
the number of members in the ranking less the extent of the ties 
involving the first and last member. In our present case this is 
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34 — 4 — 1 = 29, and the mean interval is thus 29/6 = 4-833. We 
take half this as the correction for continuity and thus have 


ESL MENS 

18:56 
giving a probability that S will be equalled or exceeded absolutely as 
0:089. The value of т, is still not significant. 

It is worth noting that in the previous example we found, for the 
2 x 2 table, a value of t, = 0-48, which was not significant. In the 
present case we find т, = 0-41, a slightly smaller value which has a 
probability of 0-089 of being exceeded in absolute value, as against 
0-24 for the value based on the 2 x 2 table. This is not a dis- 
crepancy. In this example we have not dichotomised the second 
ranking but have taken all the ranks into account. Our method 
gives more play to values which might be assumed by 
hence the probability of exceeding a given value in this m 
field may well differ from the value 
domain of the double dichotomy. 


chance, and 
ore extended 
obtained in the more restricted 


The significance of e 


4.13 Just as for т, the set of n! 
all possible (untied) rankings with an 
а set of values of p which may be used 
coefficient. The distribution of рї 1 
normality for large; but it is less с 
following reasons : : 


(а) The actual distribution is much 
and has been worked out full 
n=8; 

(b) The distribution tends to normality more slowly than that 
of т, and an intermediate form is necessary to bridge the 
gap between the values for n = 9 and the (rather doubtful) 
point at which the normal approximation is safe ; 


(c) for this intermediate region no simple methods are available 
for dealing with ties. 


values obtained by correlating 
arbitrary ranking will provide 
to test the significance of that 
S symmetrical and tends to 
азу to use than that of 7 for the 


more difficult to ascertain 
y only up to and including 


4.14 For the untied case the standard deviation of the distribu- 
tion of p is given by the simple form 


1 


var р = o? RT Us F · (4.7) 


p = 
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The corresponding expression for S(d*) is 


var sa) = (G.), RENTE sg 


6 sore dl 
For т > 20 it is probably accurate enough to use these expressions 


on the assumption of normality in the sampling distribution. They 
apply, of course, only where there is no parent correlation. 


Example 4.6 
In a ranking of 20 the observed value of S(d?) is 840. We then 


have 
6 x 840 


20 x 21 x 19 
= 0۰3684. 
The standard error, from (4.7), is 1/4/19 = 02294. Thus the 
observed value is 0:3684/0-2294 = 1-61 times its standard error. 


This is barely significant. E 

For such large values of S(d?) a correction for continuity is of very 
minor importance, but if we wish to make one we subtract unity 
from S(d?). From (4.8) we then have, for the standard error of S(d?), 


3 — 9 
t e 20 ° f = 071. 
6 19 


Now (43) varies from 0 to 3(%% — n) with a mean at (n? — m), 
in this case 1830. The deviation of the observed value is then 
840 — 1330 = — 490. Witha correction for continuity the deviation 


— 1-60 times the standard error, leading 


in absolute value is 288 
805:1 


to the same conclusion as for p. 


4.15 Figure 4.2 shows the distribution of S(d?) as а frequency- 
The sawlike profile of the polygon is unusual, 
but the correspondence with the normal curve is evidently moderately 
good, though not good enough for our purposes because in testing 
significance we are mainly interested in the fit near the tails. 

As we shall show in the next chapter, a better approximation for 
lower values of т (say between 9 and 20) is given by the curve 


polygon for » — 8. 


(4.9) 


k 

К) = p — oc 

1 
qu 


which is one form of the distribution known to statisticians as 
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Student's“ (“ Student" being the nom-de-plume of W. S. Gosset, 
who introduced the expression in quite another connection). If we 


write 
= q= БН а) 
I— р? 


the distribution of ¢ is approximately that of (4.9), and we may thus 
use it to test the significance of p, provided that we are given tables 


о 20 40 60 808,90 по 130 150 . 168 


FIG. 4.2. 


showing the areas under the curve for various values of f. Such 
a table is shown in Appendix Table 4. "The quantity » in this table 
is, in our present case, equal to n — 2. The use of the table will be 
apparent from the following example. 


If ties are present no modification is required in the variance of p, 
which remains as l/(m — 1). 
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Example 4.7 
For a ranking of 10, p is found to be 0:189. From (4.10) we have 


t = 0189 "i 5 = 0:897. 
І — (0-139)? 


From Appendix Table 4 we see that the chance of getting a value 
not less than this for n = 10 (v = 8) is 1 — 0:65 = 0-35, and thus 
the chance of obtaining a value not less absolutely is 0-70. This is 
not small, and the value of p is therefore not significant. 


Tests in the non-null case 

4.16 The tests we have used up to this point are based on the 
distribution of correlations in a population of values obtained by 
permuting the ranks in all possible ways. This, effectively, is a test 
of the hypothesis that the qualities under consideration are inde- 
pendent in the parent population. Our test will then show whether 
an observed correlation is significant of a departure of the parent 
correlation from zero. But we may also wish to test correlations from 
a rather different viewpoint, or rather, to assign limits to the parent 
value in some probabilistic sense. For instance, suppose we arrive 
at a value of т equal to 0-6 and it is found to be significant. Can we 
say between what values the true correlation probably lies ? Again, 
if we have a further value from a different sample equal to 0-8, also 
significant, can we say that the second is significantly greater than the 
first, or is the difference such as may have arisen by chance? 


4.17 We shall suppose the whole population N in number to be 
ranked according to the first variate in the order 1... N. This 
can evidently be done without loss of generality. Suppose the 
population is laid out in a line in this order, the ranks according to 
the second variate being p; for the ith member. Then in accordance 
with our usual technique we may calculate a value of т for this 
population. ң 

Now suppose that a sample of n members is chosen from the N. 
These members will be in the natural order according to the first 
variate. We may then calculate a value, which we will denote by t, 
for the sample which is the value of т for that set of n. We merely 
write f instead of т to denote that we are considering a sample value 


instead of a population value. 'То each possible sample there will 
N! 
correspond a value of 1, and since there are C) = ПА 0, 


E 
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samples there will be an equivalent number of values of ¢ correspond- 
ing to all possible samples. 


4.18 In the next chapter we shall show that for any parent 
whatsoever the distribution of f tends to normality as n, the sample 
size, increases, provided that the parent т is not too near to unity. 
We shall also show that the mean value of this distribution is т. 
So far, so good, but we then encounter difficulties. If the standard 
deviation of the distribution were dependent only on т we could 
easily test the observed value in the desired manner ; but in fact the 
standard deviation depends on other unknown quantities. 


4.19 A simple illustration will emphasise the point. Consider 
the ranking of 9 (according to the second variate)— 


5 2 3 1 6 7 8 9 4 


There are = 84 possible samples of 8 from this “ population”. 


If they are written down and 5 evaluated for each we find the 
following distribution : 


Values of S Frequency 
0 2 
1 15 
2 84 
3 38 
TOTAL: 84 


The mean of this distribution will be found to be 13/6 and hence 
the mean value of the 84 values of t is 


E(t) = =a "Nux 


It will be found that, for the parent т, S = 26 and hence 


Ws абаад, 
96 
verifying our statement that the mean value of 


t is the parent т. 
Тће ranking 
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also has т = 0:44, but the distribution of S in samples of 3 15 now 


Values of S Frequency 
0 3 
1 16 
2 29 
8 86 
TOTAL: 84 


Again the mean value of #15 equal to the parent т, but the distribution 
of S in the second case is different from that in the first, and its 
variance is 0-734 against 0-639 in the second case. 


4.20 We are therefore in the difficulty that unless we know 
something about the arrangement of the ranks in the parent (know- 
ledge which is usually lacking) we cannot express the variance of % 
in terms of known factors. We shall, however, show in the next 
chapter that for any parent the variance of f cannot exceed a certain 
value, namely 


varios 1 re 
т 


This result will give us a test which is оп the safe side. Тһе argument 
will be clear from an example. 


Example 4.8 

In a ranking of 30 a value of t is found to be 0-816. Assuming 
this ranking to be a random sample, what can be said about the 
value of т in the parent ? 

For samples of 30 the distribution of ? may be taken to be normal, 
and its mean may be estimated as 0:816. From (4.11) we find that 
var t < 21 — (0:816)? = 0:022,276, 

giving a standard error of 0:149 or less. 

Now the probability of a deviation from the mean of 1-96 times 
the standard deviation or greater (in absolute value) is 0-05. We may 
therefore say that at the worst, the probability is 0-95 that the true 
value of т lies within (0-149 x 1:96) = 0-292 of 0:816, Фе. that the 
probability is less than or equal to 0-05 that the true value lies outside 
the range 0:524 to 1-0. Instead of setting limits to the value in the 
usual statistical manner with an assigned degree of probability, we set 
limits with a maximum to the degree of probability; or, to put it 
another way, we set outside limits to the range of the parent т. This 
may make our inference unnecessarily stringent, but we err on the 
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safe side in the sense that we run no danger of attributing significance 
to non-significant results, though we may in some cases fail to discern 
significance where it really exists. 


4.21 We have stated the argument above by relation to standard 
errors, the form in which it is usually introduced in elementary treat- 
ments of statistics.* In order to apply (4.11), however, we need to 
replace the unknown parent value t by the sample value 1, as has been 
donë in the previous example. Such a procedure may be avoided 
by recourse to the theory of confidence intervals. If 18 the normal 
deviate corresponding to a probability level of P per cent (i.e. if the 
probability is 0-0Р that there will occur on random sampling a 
deviation from the mean of а times the standard deviation or greater 


in absolute value), then, to this probability at the most, we may 
assert that 


Tye Кы TQ. 5 2 . (4.12) 
where ті, т; are the roots of 


і-т СЕ = 


п.е. are given by 


FE „ 


Example 4.9 


For example, in the data of Example 4.8, n = 30, t = 0-816. 
The normal deviate corresponding to a probability of 0-05 is 1-96. 
Substitution in (4,18) then gives 

д t + 0:50607 4/(1-2561 — 13) 
1-2561 
= 0:84 or 0-96. 
Thus we may assert that т lies between 0-34 and 0-96, being sure of 
correctness in at least 95 per cent of the cases on the average. This 


is a more accurate method than that of 4.20. The limits are, of course, 
still maxima. 


* See, for instance, Introduction, Chapters 19 and 20. For the theory of 


confidence intervals see Kendall, The Advanced Th isti IL 
Chapter 19. еоту of Statistics, Vol. II, 
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Example 4.10 

In a sample of 20 a value of т equal to 0-8 is observed. А second 
sample of 20 gives a value of 0-6. Is there any indication that the 
samples are from different material, or could the different values of т 


arise by chance ? 
For a maximum to the variance in the first case we have 


2 
vari = –(1 — т?) 
n 


= 0:036, 
giving a standard error of 0-19. Тһе second yalue differs by about 
this amount, and consequently we cannot attribute significance to 
the result. 

Alternatively, we might argue that the variance of the second 
value is 0-064. Thus, a maximum to the variance of their difference 
(being equal to the sum of the variances) is 0-100, the corresponding 
standard error being 0-32. This is greater than the actual difference 
of 0-2, and again we conclude that the difference is not significant. 

Generally, if we have a number of values of S (even from rankings 
of different extents) we may add them together and test the signific- 
ance of the whole set against the sums of the variances of the individual 
rankings. The validity of this procedure rests on the fact that the 
variance of the sum of independent variables is the sum of their 
variances. It has been found very useful in field work. 


4.22 The foregoing examples illustrate one rather disappointing 
feature of rank correlation coefficients, namely the comparatively 
large standard error which they possess. Whatever т may be, the 
standard error is of the order of /(2/n). This shortcoming, however, 
is a feature of most correlation coefficients. The standard error of 
the product-moment coefficient in normal samples, for instance, is 
(1 — %%% Vn and thus is, in general, of order 1/ Vn. It is clearly 
impossible to locate the parent correlation very closely unless the 
ranking contains 30 or 40 members. This provides a useful caution 
against attributing reality to correlation coefficients calculated from 
rankings of small extent, unless several sample values are available. 

For instance, with a ranking of 32 members the maximum 
Standard error is }4/(1 — т?) and if ¢ is near zero we cannot, on the 
above basis, locate the parent v in a narrower range than twice the 
Standard error, or + 0:5. If tis, say, 0:8 the range becomes narrower, 
but the band of doubt surrounding the true value still extends from 


0:5 to 1-0. 
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4.23 It is not possible to improve very much on the maximum 
limits given by (4.11) or (4.18), as we shall show in the next chapter. 
We shall see, however, that very substantial improvements are 
possible if the original rankings are available and the investigator 
has patience to carry out the necessary arithmetic. Apart from this 
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CHAPTER 5 
PROOF OF THE RESULTS OF CHAPTER 4 


5.1 The series of formulae we have used in the last chapter for 
testing the significance of т and p requires four types of result, viz. 


(a) the derivation of exact distributions for low value of т; 

(b) the proof that distributions tend to normality for large %; 

(c) the derivation of the means and variances of the limiting 
distributions ; 

(d) the derivation of the corrections for continuity. 


We treat them in that order and conclude the chapter with a more 
detailed analysis of the non-null case. 


Exact distribution of « in the null case 

5.2 If we correlate a fixed ranking of n members with the n! 
possible rankings (excluding ties) we obtain the same distribution 
whatever the fixed ranking; for all possibilities occur. We therefore 
lose no generality in supposing our fixed ranking to be the natural 
order 1... m. Let u(n, 8) be the number of values of S in the 
aggregate of n! possible values obtained by correlating this with all 


possible rankings. 
Consider one such ranking and the effect of inserting а new 


member (n + 1) at the various places in it, from the first (preceding 
the first member p,) to the last (following the last member р,). 
Inserting (n + 1) at the beginning will add — n to S; inserting it 
at the second place adds — (n — 2); at the third place — (n — 4) 
and so on, the last place adding m. It follows that 
u(n +1, S) = шп, S n) Tun, S— n + 2) + u(n, S—n + 4) + 
+ un, S + n 4) + шп, S ＋ n— 2) + щт, S +) (5.1) 
This recursion formula enables us to ascertain the frequency- 
distribution of S for n + 1 when we know that for т; and hence 
we may build up the series of distributions from the simpler cases 


for n = 2,8, etc. 


5.3 In practice the procedure may be simplified. For n = 2 
there are two values of S, viz. — 1 and +1. If we write the 
55 
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frequencies down three times, one under the other, moving a stage 
to the right cach time, we get 


1 1 


1 2 2 1 


and the sum gives the frequencies of S for n = 8, the actual values 
ranging from — zu — 1) by units of two to y(n — 1), i.e. through 
the values — 8, — 1, + 1, 4-8. 

Similarly for n = 4 we write down the array for n = 8 four times 
and sum, as follows : 


1 2 
1 


HNN 
кюн 


1 
2 1 
2 2 1 


1 3 5 6 5 3 1 


the frequencies being for values of S from — 6 by units of 2 to + 6, 

The validity of this rule will be evident from the previous section. 
To any set of S-values for a given n, there will correspond in the 
distribution for n +1 a similar set with values of S increased by 
= tn —2),...(%—2), т. What we have done is to write 


these frequencies down, one row for — ^, one for — (n —2) and so 
on, and to sum them. 


5.4 Even this procedure ma 


] у ђе simplified. We form a 
numerical triangle as follows : 


n Frequencies 

1 1 

2 1 1 

Б] 1 2 2 1 

4 1 8 5 6 5 3 1 

5 1 4 9 15 20 22 20 15 9 4 1 
etc. 


| In this array a number in the rth row is the sum of the number 
immediately above it and the (r — 1) members to the left of that 
number; e.g. in the fifth row 22 = 


5.5 Тһе distribution of S is symmetrical ; 


A с for to any ranking 
giving a particular value of S there will corre 


spond a conjugate 
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ranking giving — S. The mean value of S is therefore zero, as we 
should expect. That S ranges from — nn — 1) to n(n — 1) is 
evident from the fact that these are the extreme values given when 
the ranking is the inverse of the natural order or the natural order 
itself. Further, the interval between successive values of & for 
given n is 2. This follows from (5.1), or perhaps more simply by 
the consideration that an interchange of a pair of members in a 
ranking alters the positive score by + 1 or — 1 and hence the 
difference between positive and negative scores (which is S) by + 2 
Ol 2, 


Tendency of т to normality in the null case 


5.6 It remains to determine the variance of $ and to prove that 
the distribution tends to normality as n increases. 
In the notation of Chapter 2 let us write 
Cy = Qij bm е 5 „ AGED) 
where a and b refer to the scores in the different rankings. 


Put 
— Ci . . . . (5.8) 
Тћеп 
. Це ни оу) 
Now 
T= = 2. · (55) 
1-1 


for this is the score associated with the ith member and is 
(n — i) — (i — 1). Further, 
тъ 
aj, = n(n — 1) . Р 7 (88) 
fii 
for a? is + 1 and this sum is merely the number of possible ways 
of choosing a pair from т members, each pair being counted twice, 
once as XY and once as YX. It follows from (5.5) that 


аад а — 9i) 


i,l=1 on 
= n(n + 1) 2 Уз 
ігі 


= 0, 
as is to ђе expected. 
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We also find 


n 


Ра Gg d; = a; (n + 1 — 2i) 
757 


1771 
(n +1 — 2702 

= JZ (n +1)? — 40 ＋ 1) Fi4 + 2 

n(n + 1)? — 2n(n + 1)? + $n(n 

—in(n? —1) . : Ё Я " a (8:7) 


Now, writing E to denote mean values on summation over all possible | 
permutations, we have | 


Ele) = E УХ (ag by) 


i,j=1 
T 55 E (a7 bij) 
ij 


Since a; and b; are independent, any fixed value of one being taken 
with all possible values of the other, and since the mean value E of any 
term a; or b; is zero, we have 

He)=0, . . . . (59 


confirming that the mean of с (or of S) is zero. For the variance of c 
we require 


E(c?) = к (ai; bij) ү 
17 
= EU (ai; bi) + Ж (iz Dig ai b) + 2 (а; bij ащ hs) 
1,7 


i,j, k i, J. k,l 


|| 


(5.9) 


7 ; 
where Z' denotes summation over values for which j z k and Z” over 


values for which ik and Jj 1. We evaluate these terms 
separately. 


(i) The term X" vanishes. 
show that 2” a, a, vanishes, 


all values of the suffixes) 


To demonstrate this it is enough to 
Now (Z relating to summation over | 


2, 
ада = 2" di a + Хауа, У аз ац + 2 аад 


Жаа) + Хауа + Z a; an. 
The terms after the first оп the right vanish in pairs in virtue of such 
relations as a; 


; у = — dg. The term on the left vanishes because | 
(summation extending over all values) X аҙад- Ха, E ад and 
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Ха) = 0. Consequently the first term on the right also vanishes, 
which is the required result. 

(ii) The term E(Z a} bj is the mean of X aj; bj, over all permuta- 
tions of one ranking. Tod the sum of all terms contributing to this 
mean each a;; occurs with b,; and bj. The sum is thus 2 Za; X bj; 
multiplied by the number of ways in which other members can vary 
when two are fixed, i.e. (n — 2)! The term is thus 

2(n — 2 pay zb) 
n! 


(iii) Similarly the second term on the right in (3.9) is 
4(n — 3)! 
— Хауа, Х bij Vines 
for if three suffixes are fixed the remainder can vary in (n — 8)! ways ; 
and of the four suffixes two ean be identical in 4 ways. 
Furthermore 
5' ay ag = L tij а — Хад 


Substituting in (5.9) we thus find 


Е(оз) = E Жара — Ха (Z by ba, — УШ) 
(e n(n — 1)n — gj dn 2 ы 
Wap ЖБД ЭЛН: С . (5.10) 

i a —1) 2 


Now substituting from (5.6) and (5.7) we find 


Elce — 4 ln(n? — 1) — n(n — 1)}? 
(с?) EE g 4%» ) 
J as {n(n — 1)? 
e en е) 
9 
It follows that 1)(2n + 5) 
E(S?) = var S = шш = 1 25 E . (512) 


2n F5) 42 (518) 


Dm oe v. 
3 9n(n — 1) 


(5.12) is the result announced in (4.2). 
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5.7 If ties are present, equation (5.10) remains valid but expres- 
sions (5.6) and (5.7) require modification. In place of (5.6) we have 


Ха = n(n — 1) — XK —1) . (5.14) 
t 
where summation X takes place over the various ties. "This follows 


t 
simply from the consideration that for а pair of tied ranks a; = 0 
and consequently the sum of the squares of contributions from a tied 
set is of the same form as for the ranking as a whole. 
In place of (5.7) we have 


Zay aw = n(n? — 1) 3 Zi — 1). _. (515) 
t 


This is not quite so obvious. Consider the effect of tying a set of 
ranks. The contribution to the sum on the left of (5.15) will be 
unchanged if the suffix û falls outside this set. If i is inside the set 
and j, k are outside, again the contribution is unchanged. If both 
j, k fall inside no contribution arises, and therefore we have to subtract 
the term 444% — 1). If one falls inside and one outside the contri- 
bution remains unchanged, for it was zero in the original untied case, 
each possible pair occurring once to give + 1 and once to give - 1. 
If we substitute from (5.14) апа (5.15) in (5.10) we find, for the 
variance of S when the rankings contain ties typified by t, u, 


var 5 = ae (n(n fen S) — 28 — 1021 +5) - ACS) } 


1 
+ insist 102—2) (Z u(u—1)(u—2)) 


Т о u(u—1)) Ми" cx 45777510) 


This is the result given in (4.8). Equations (4.4), (4.5) and (4.6) 
follow at once. 


5.8 In proving that the distribution of 
shall follow a procedure which, with a few trivial changes, will also 
prove the normality of S(d?) and will form an introduction to more 
general results, due to Daniels (1944), concerning the limiting forms 
of the general rank correlation coefficients as defined in Chapter 2. 

We prove that the moments of the distribution of & tend to 
those of the normal distribution. It will then follow from what is 
known as the Second' Limit Theorem that the distribution tends to 
normality.* Since the distribution of S is symmetrical moments 


* Kendall, Advanced Theory, Vol. I, 4.24. 


S tends to normality we 
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about the mean of odd order vanish. We have only to show that 
for the even moments 


E,, 


t Consider the mean value of (Za; бр)“. When this expression 
is expanded it will consist of terms such as 

2" agg mn . bij b bun 
When summations extend over values of suffixes which exclude 
certain values (e.g. û = К) we can replace them by complete summa- 
tions. Our expanded expression will then, apart from numerical 
terms, consist of terms such as 

Хау a, - · + Z by UE (5.18) 
where certain suffixes may be the same or “tied ” and others will be 
different and “ free ". Consider a term іп which the 47 suffixes of 
а are tied in pairs, e.g. 

Ха Gi Gm Um + * ° (5.19) 

There are then 87 independent suffixes. Now Za, û, is of order jn? 
and consequently (5.19) is of order (јаз). In the expansion of 
(Z а, by)?” terms of this type will arise with a frequency which is 
the number of ways of tying the 2r paired suffixes. This is the 
produet of three factors, viz. (i) the number of ways of picking ra’s 


from 27, namely (7) ; (ii) the number of ways of associating these 


; (iii) 2" arising from the fact 


with the remaining r factors, namely r! 
be tied. The numerical 


that either suffix of a particular а may 
coefficient is then— 
27% gr — (201 2" 
T r! 


fixed the remaining members of the 
Hence „er contains a term 


Furthermore, if 3r suffixes are 
ranking can vary in (n — 3r)! ways. 
„ _ Br)! 99)! А 
(n 8r)! 2 (27)! yr 
n! r! 
jM T 
Г ӘЛЕУ 
arr! 9 
But u is of order zus (as we see by putting 7 = 1) and hence ji; 
contains a term 
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Our demonstration will be complete if we show that all other terms 
are of lower order in n. 


Ifa term in (5.18) contains a pair of suffixes neither of which appears 
elsewhere, the term vanishes because Х да = 0. Consider a term 


then over not more than 37 — 1 suffixes and cannot be of greater 


order ап (2671) 2. If 8r — 1 or fewer suffixes are fixed the order 
is then not greater than 


85 
(n — 3r +1)! п®— 371 
n! 


and is thus lower than that of the term already found. 


Distribution of e in the null case 


5.9 The actual distribution of p is much more difficult to 
determine than that of т, and no method is known for constructing 
the distribution for n +1 from that for n. Consider the array 


0 1 2 "88 р--8 paf 

—1 0 1 „ — 4 $—8 „ 2 

— 2 — 1 0 2-5 4-4 4-8 
- (n — 2) — (n — 8) = (И) con, --Я 0 1 
— (n — 1) — (n — 2) =(% —8).:. i —1 о 

5. (5.20) 


Any permissible set of deviations between the order 1, 2,... т 


and an arbitrary order is given by selecting a member from the array 


50 that no two members appear in more than one row or column. 
Consider then the array 


a? at 


44 ... gn-3 ат—)* aq »-0* 
at a at oe ahi alu ЕСЕ 
an-: وم‎ q-4)* а ао аї 
Q-1*. = a(n-3* at a a? 
(5.21) 


The indices in (5.21) are the Squares of the entries in (5.20), and if we 
expand (5.21) we shall get the totality of values of S(d?) Ву 
“ expanding " we mean developing the array by selecting, in the n! 
possible ways, n factors from (5.21) such that no two have a row or 
column in common, multiplying them for each set of factors and 
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summing the n! resultants. This method was used to arrive at the 
distributions which form the basis of Appendix Table 2. 


5.10 Certain simple properties of the distribution of S(d?) are 
evident from elementary considerations. First, any value of S(d?) 
must be even; for X (d) = 0 and hence the number of odd values of d, 
and thus d?, must be even. Secondly, the distribution is symmetrical, 
for to any value of S(d?) there corresponds a value 3(n? — n) — S(d?) 
derived from the conjugate ranking (2.9). Thirdly, the mean of the 
distribution is 1(n? — n). Fourthly, it ranges from 0 to 307 — n). 


5.11 We may determine the variance of 8 in the manner of 5.6. 
From the way in which we derived (5.10) in terms of the scores a and b 
it will be clear that the equation is equally true for c when the scores 
relate to Spearman's p, for without loss of generality we may write 

ay =j , (5.22) 
where we have added a prime to distinguish it from the score where т 


15 concerned. We have at once 
n 


а; -ітп--1- 2%) Қ . (5.28) 
7=1 
апа 
n 
Di aj = 2G – 9 
DE 
t n 3 2 
=2n ) 7% 21 ( i) 
= 1)0 — 1) 
апа 
n 
ay dy = gin — 1) (5.24) 
1 1, k=l 
Substitution in (5.10) now gives 
ya, _ n(n — 1) + 1)? - (625) 
B= 786 
Remembering that 
_ _ 648) 
ы: n? — n 


and that from (2.9) 
с = in*(n? — 1) — nS(d?), 
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we find from (5.25) 


var p = E(p?) = aap E . (5.26) 


which is the formula (4.7).* 


5.12 By similar methods it may be shown that the fourth 
moment of p is given by 
. 3(25n? — 38n? — 35n + 72) 
25n(n + 1)(n — 1)? 
The third moment, of course, is zero in virtue of the symmetry of the 
distribution. 
Consider now the distribution 


Ш 


. (5.27) 


1 
dF = — ——(1 — ) 2 dg -1<4<1 . (5.28 
N Эй Be 


This has zero moments of odd order and for the second and fourth 
moments 


j —.B(bin—-1) _ 1 
Иг Hui tp c (5.29) 


— В(5, in ыы 1) “a 3 
Bit, іп —1) n?—1 


4 (5.80) 
Thus the second moment is the same as that of р. The fourth 
moment is not equal to that of p, but their difference is of order 


3 fı _ 25n? — 38n? — 35n +- a — 86 
n? —1 25n(n — 1)? 2513 


i.e. is of lower order in n than the moments themselves. To a fairly 
close approximation, therefore, we may regard p as having the 


distribution (5.28). This will be more accurate than the normal 
approximation. If we put f 


tao [92 
1 — 2? 


* That this formula requires no modification for tied ranks follows from 
(7.14) with m = 2, bearin 


- aring in mind that (6.6) is to be modified in the tied 
case. Alternatively, with с as defined in (5.3) and scores ау — а, y; — Ys it may 
be shown that E(c?) = 4n? X (u — che X (y — g)*/n — 1, whence (5.26) follows 
whether ranks are tied or not. 
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the distribution becomes 
1 ; dt 


Vn — 2)B(4, m — 1) ( . re pues SS ED 


ак = 


. (5.81) 


the form of the distribution known as“ Student's ” which is tabulated 
in Appendix Table 4, This is the origin of the rule given in 4.15 for 
testing p in this distribution. 


Joint distribution of т and р 

5.13 The tendency of р to normality may be demonstrated by 
making a few alterations in the proof of the normality of тіп 5.8. 
Both results are particular cases of a more general theorem due to 
Daniels (1944) which we now prove. 

We shall show, in fact, that the joint distribution of т and p tends 
to the bivariate normal form as n tends to infinity. Indeed, under 
certain non-restrictive conditions, any two coefficients of the general 
type defined in 2.2 tend to joint bivariate normality. 

Suppose that a, a’ refer to scores in two such coefficients, and 
similarly for b and b“. Then we shall show that the product-moments 
of the joint distribution of the corresponding с and с' tend to those 
of the normal bivariate form. 

The pth order product-moments of t 
of terms containing 


his joint distribution are sums 


Халаца 2 Барыда a (5.82) 


in the Ls may ђе tied or free. Each 


where groups of suffixes with i 
hich may belong to either system. 


5" involves products of p scores W. ; 
Every such ZX’ is in turn a linear combination of the corresponding 


2 having the same suffixes and other L's in which additional tied 
suffixes appear. No Х may contain a pair of free suffixes attached 
to one score, for it would then vanish by virtue of the fact that 
Ха; = 0. 
We discuss first the moments of even order. Let p = 2m. 
Consider a X in which the 2m scores are divided into m pairs each 
having one tied suffix, so that there are in all 3m independent suffixes, 
e.g, 
(5.88) 


2 Qij Ga dir Us и o * ° ° И 
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It may be written as 


(Ха; ад) аҙ а) (> а; а)" . . (5.84) 


where 4 + u +v = m and 7, u, у are the number of times the scores 
are paired in the combinations indicated. 

As is always possible, suppose the numerically largest value of aj 
to be made equal to unity. We now impose the condition that 
Za; dir is of order n°, whether a; and ay, belong to the same or 
different systems of scores. This, in particular, is satisfied by т and p, 
when max. а, = 1. With this condition it is seen that T's of the 
above type are of order nam. 

Other ways of tying suffixes give 28 of lower order of magnitude. 
For the order of magnitude of the expression is not reduced on 
replacing each a; by +1; consequently if further suffixes are tied 
the order of X is made less than n?" since there are fewer than 3m 
summations from 1 to n. It follows that the dominant term in a 2” 
is the corresponding X having the same array of suffixes. 

Moreover, every non-vanishing X involving 3m independent 
suffixes can only be a permutation of type (5.32), while those with 
more than 8m different suffixes must all vanish. This will be clear 
by considering how the 3m suffixes can be arrayed between the 2m 
Scores. Begin by assigning 3m different suffixes at random among 
the 4m available places. At least m scores Will receive their full 
complement of suffixes, all of which will be different. There cannot 
be more than m such completed scores, for if X is not to vanish at 
least one suffix of each complete pair must be tied, and this can only 
be done by repeating one suffix from every complete pair in each of 
the remaining places to be filled, of which there are only m. We 
are thus led to a permutation of the type of X discussed above. If 
there had been more than 3m different suffixes to begin with, there 
would not have remained sufficient empty places to prevent the 
Existence of at least one score with а pair of free suffixes. Hence all 
278 with more than 3m different suffixes must vanish. 

Any 2mth product-moment is the sum of terms like 


8 
т! aa ij ац... 2 Drs btu 27%% 


where f is the number of independent suffixes in the 2% and A is 

a coefficient which is of unit order 50 far as n is concerned. From 

the preceding argument the maximum value of f is 3m, in which case 

the term is of order 27?" x qm y p3m _ nim When f 3m 1 
3 < 


> Ss 
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the term is of order not greater than „7 3™*1 x (71% = п3т—1 and 
hence such terms may be neglected. Write 
hy = Ха ag, Z biu by, 
Та = Хау ар 2 бр by, 2 D . (85.85) 
has = Хауа 2 ди, by 
Then if lower-order terms are neglected, the even product-moment 
Urs T + 8 = 2m is given by the sum of terms like 
N д, li hio 28 ＋ r, p+2 =s . (5.36) 
over all possible values of A, р, v. The coefficient 4, is determined 
by the following argument. Consider а X whose array of suffixes 
is such that it can be factorised as (Хауа) (Z ayy ai) (2 арац)". 
Its suffix pairs can be permuted in 7! s! ways within the sets of scores 
of the two types a and a’, but of these 2! (2!)* u! v! (2!) give essentially 
the same X. The suffixes within pairs attached to each score may 
also be rearranged in 2?" ways without affecting the result. Hence 
| ој 92m. | of өт-и 
т! s! 2 _ 71812 . (5.37) 
AL 17127 А! ul v! 


From (5.85), (5.36) and (5.87) it follows that the calculation of и, is 
tantamount to determining the coefficient of 58 in 


Ам 


2" (hat + hastata + В)" „ 88) 
n т: 


We now consider the odd moments. For p and т these vanish by 
symmetry, but even in the more general case it can be shown that 
they are negligible to order n-?. Fora У containing 2m + 1 scores 
cannot have more than 3m + 1 suffixes. This follows by a similar 
argument to that employed above for the even-order moments. 
Hence the order of magnitude of any (2m + 1)th moment is at most 
m" The 2mth moments were shown to be of order u. А 
(2m + 1)th moment is of order at most пї@т+) x n”. The odd 
moments are therefore of lower order (n~?) compared with the even 
moments. 

ё Finally, it follows from (5.98) that the moment /4, is the coefficient 
of tts in В 


ехр вый 4 Qhy tite + haot) (5.89) 
n 
on of a bivariate normal 


This is the moment-generating- functi 
Vol. I, 15.12). The result 


distribution (Kendall, Advanced Theory, 
15 proved. 
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5.14 It is of some interest to consider the product-moment 
correlation between р and т. In exactly the same manner as in the 
derivation of (5.10) we find, for the mean value E(cc’) 


Elec) = <a =н СЕ (ада — Zagag HZ ba Big, E bin Big} 
2 
n(n — 1) i 
In the particular case when c relates to 4 and с' to S(d?) 
we find 


Z aj ay Z by big . —. 65.40) 


Ха ад = т? — 1) (5.41) 

i,j,k 

аа = du =1) . 6.42) 

%7 
and on substitution in (5.40) we find 

Elec“) = n — 1)(n + 1)? ‚ . (5.48) 

Thus the product-moment correlation between S and S(d?), which 
is the same as that between 7 and p, is given by 

8) Жаға) (5.44) 


VU) Е(сзу) ” v {2n(2n + 5)} 


For large n this tends to unity. Even for moderate n it is quite high. 
For n = 5 it is 0-980 and for n = 20 it is 0-990. 


5.15 We thus expect that the numerical values of т and p might 
in practice be found to bear a constant ratio. Since the ratio of their 
variances is 


vart — 4(2n + 5) — 
var | 18(n — 1) 9 
we expect this ratio to be 4/4 = 2. This is the origin of the statement 
in 1.20 that р is often about 50 рег cent greater than т. This will not 
hold, of course, when т is near unity. 


Corrections for continuity 


5.16 We turn now to the question of correct 
the rules for which were given in 4.12. 

(a) Consider first the case where one ranking is untied and the 
other contains ties, and may in the extreme case be a dichotomy. 
We may imagine the untied ranking in the natural order and the other 
in any arbitrary order. If we interchange a pair of neighbouring 


ions for continuity, 


— — 
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members in the untied ranking the only scores affected are those 
involving both members. Either the two ranks in the second ranking 
are untied or they are tied. In the first case the score 8 alters by 
two units, in the latter it remains unchanged. However many ties 
there are in the second ranking (short of the whole set being completely 
tied) there must be one interchange of neighbouring members in the 
first ranking which alters S by 2. Thus all intervals between succes- 
sive values of S in the distribution of S are two units, and the appro- 
priate correction for continuity is one unit. 

(b) If now the first ranking consists entirely of ties of extent t and 
the second is dichotomised, a change of two neighbours from different 
tied groups can at the most—and will at least for some rankings of the 
second variate—alter S by 21, and the continuity correction is t. 

(c) If both variates are dichotomised then, as in 3.14, 
S = ad — be. The least change that can result is an increase or 
decrease of a unit in a, and in such a case (say an increase) the increase 
іп S is 
(а + 1)(d +1) — (b e — 1) — (ad — be) =a +b +c d = У. 
The continuity correction is thus £N. 

(d) When both rankings contain ties it is not possible to lay down 
any general rule for continuity corrections. Ifthe point is important 
some special consideration such as that in Example 4.5 is necessary. 


The non-null case 

5.17 We turn now to the more difficult case where parental 
correlation exists. We denote the parent value by т and the sample 
value by t. We prove in the first place that the mean value of t over 


all possible samples is 7. 
Consider the (Қ samples of n from a population of N members. 


Any particular pair of members will occur in i 2 samples, that 


is, all pairs occur equally frequently in the totality of all samples. 
'Thus the total score for all samples is " © >) times the score for 


the population, say Z. Thus 


N —2\ 5, 
E(t) = М ES Tum x 5 (5.45) 
n 
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5.18 Next we derive an expression for the variance of t. Let 


c? be the quantity c for a sample ranking of » and c be the parent 
value. Then 


c 


ШК (5.46) 


апа 
cim = ха) о, ; : а . (5.47) 


where Tin denotes summation over those у; 
in the sample. 


We require E(t?), so consider 


alues of i and j occurring 


ZfeM}2 = TSIM oo, . : · (5.48) 
7% n 


denoting summation over all selections of the sample of n from 


the population of N members. Let us enumerate the number of 
ways in which c; cj; and similar products with tied suffixes occur 
in the sum. 


(i) When i, j, k, l are all different, the term Су сы may occur with 


N – 4 к E 
4 selections of the remaining members of the sample, and the 
contribution of such terms to X is 

n 


5" as usual denotin 


Б summation over unequal values of % J, Б, 1 from 
1 to N. 


(ii) The term Су Cj; Similarly occurs in e a 5) ways and there 


are four ways of tying one suffix. The contribution is thus 


N N 
( Aaa 5) = су Са. 


(іі) Terms like cj; similarly contribute 


Thus 


V- A N —8\ w % 
Уст) — (с Bor ЖАҒЫ E 2 Cij Сд, 
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Expressing the X"s in terms of T's and dividing by (7) we find 


Ejc™} 2 = Fe бу сы — 4 L Cy Cy + 2 У с) 
42131 А , 
+ ут (T eg en — Ха) 552 2. ы Zd 5 . (5.49) 


where n = n(n —1)...(n—r-41). Since Xcj = MN —1) 
and У су cj, = c? the variance of t шг given т and n is seen to depend 


on Хсусц = Х сі, where 2с; = E Ci. 


Let N become large. The An c and X с? are respectively 
of order N? and 3, so if we write 
C; 


т = N ‘ à d . (5.50) 
we find 
xf co ا‎ 49) Id 2 (5.51) 
n(n — 1) n(n —1) N — n(n—1) 
Thus, in the limit, 
___ An — 2) " A а (252 
var t КОК var т; + naro i т?) (5.52) 


5.19 Consider now су = а; бу). Keeping b; equal to +1, 
let the a’s assume any values, subject to the conditions 


Ба; = MN - I), Z ay bij = с = N(N — ly. 
The stationary values of Х с? occur when the a’s satisfy the relations 
(с; + 6) — Аа — ub; =0 : . (5.58) 

where 4 and и are undetermined multipliers. Multiplying by b; 
and summing for all j, we find 

_ АҮ - 2 — 5 

N—2—4 
Thus, unless the c; are all to be equal, in which case 2 c? is a minimum, 
A and м must take the values 

4=N —2, ш-</(У — 1). 
Multiplying (5.53) by ау and summing over 2, we have 
2 Zc? —AN(N — 1) — uc = 0, 
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whence it follows that 2 с? cannot exceed 

VIV — I) N — 2) + зе КМ — 1); 
For large N this implies that 


Z/N < 10 + 72). 
Hence 
var т; < ҚІ — т?) 
and thus, from (5.52), 


2(n — 2) 2 " 
vart < 125 == 1) SF n6 P т?) 


а 29 oe ДУ А s 
т 
which is equation (4.11) of the previous chapter. 


5.20 The form of this result suggests using a transformation 
w=sin“1¢ , 5 : . (5.55) 
To the same order of approximation we may take w as being normally 


distributed about о = віп-і т, and the variance of w will obey the 
relation 


varw< 2 4 5 2 . (5.56) 
т 
which has the advantage of being independent of о. It is not known 


whether the distribution in this case is nearer to normality than that 
of the original 2. 


5.21 The proof that the distribution of £ tends to normality for 
large n follows, in essentials, the demonstrations given earlier in this 
chapter. We will merely outline it. 


Write 
Sg = су — c/N? 
so that 
бу en, Tg =0 and g, = — c/N? = — (N - I) V. 


The rth moment of c? about its mean value is E {2 да], so consider 
Ee ДОБА gob es ‚ (5.57) 

An essential condition is that 
Z Zij £y, = O(N?), چ‎ E . (5.58) 


| 
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which is true only if 1 — т? is of the order of 1, so that the tendency 
to normality may break down for high correlations. 

In the manner of 5.13, for the moment of order 2m the major 
term arises from expressions like (L g;; g;;)" and other terms are of 


N m 


lower order in m. With 3m suffixes assigned there are E Br 
ways of selecting the remaining n — 3m members, and the suffixes 
(2m)! 2?» 


m! (2!у" 


7 
(% and noting that for large N and т 


N — 8m N nim 
oes 
n — 3m n Nem 
we find for the major term in the moment of order 2m 
m?" (2m)! 
Nan Calang Zij £u)" 
which is of order n°”. By the same argument terms with f < 3m 


different suffixes are of order n’ and may be neglected. Thus, 


n?" (2m)! 


can be tied in ways to give the same result. Dividing by 


Ham O Nan I £g £u)" 
ОШ х (5.59) 
атт“ ә с i 


Further, иот is of order n^? in comparison. Тће tendency to 
normality follows. The variance of ct is 


4n? 
Я au gu ба) = An var r: 5 . (5.60) 
and the variance of f accordingly is (4/n) var v; which agrees with 


(5.52) to the order considered. 


5.22 We now give reasons for supposing that the limits to the 
variance of t given by (5.54) cannot very substantially be narrowed. 
Consider a ranking such as 
5 2 3 1 6 7 8 9 4 


The number of positive pairs is 26, so # = 0-44. Let us transform 
this so as to bring the 1 to the beginning of the ranking, but move 
the 9 so as to preserve the score at 26. The 1 passes over three 
members to go to the beginning and hence adds 3 to the score. 
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The 9 must therefore proceed to the left over three members so as to 
subtract 8, and we reach the ranking 

„ M SG э Gl ma а 
Proceeding similarly with the 2 we reach 

1 2 5 9 3 6 50 8 4 


Had the 9 been contiguous to the 1 and те 
further to the left we should have moved the 
tinuing the process we ultimately arrive at 


apable of proceeding 
8, and so on. Con- 


1 2 3 4 9 8 7 6 5 


All the lower numbers are in the right order and the others in the 
inverse order. We may call this the “ canonical order ” for given S. 
It is not always possible to reduce a given ranking to canonical order, 
but there cannot be more than one individual out of place. 

If the parent ranking is inverted т becomes — т. We may reduce 
this to the canonical form and re-invert the result, so that the 
coefficient is again т. This ranking we may call the “ inverse 
canonical form ”, 


5.23 Now consider the canonice 
together, R at the beginning in t 
inverse order. If we select n — 7 
the N — R the 
and the relati 


al case when there are N members 
he right order and N — R in the 
members from the R and j from 
value of 8 for the sample of n is dn(n — 1) + 14 — 1) 
ve frequency of U = in(n —1) — 8 is 


( R ) М-р N 
i 7 а). 
Now suppose that N tends to infinity and R/N tends to a limit p. 


27% — 1) ) then tends to (Duro 
where 4 — 1 — p. The mean value of U is then 


The relative frequeney of U (= 


Eug — (Pete = pun — туг 
and since 


we must have 


„„ 
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The variance of U is found to be 
var U = n(n — 1)ра {ng — 11 — 89)) 
and so х 
16pq? {ng "E ia == 39) 5 " (5.61) 
n(n — 1) 
ТЕ the inverted parent is reduced to canonical form, giving ratios 
p' and q', we shall have 


v = ба + 09 


1664504 — 1 — 39) (5.62) 
n(n — 1) 


уат t = 


and 


var?! = 


Then, since q? + q' = 1, 
16(2 + 2 
( Yq’ 
n(n — 1) 
When т is positive @' > q and then var t > var i. Taking the 
inverse canonical ranking when 7 > 0 and the direct canonical rank- 
ing when т < 0, we find for the variance of t when m is large 


var у — vari = да — да — 9). 


таг ~ VÀ + | – ува +10. (569 


2 2 : 
The ratio of this quantity to the upper limit —(1 — v?) varies from 
n 


2(4/2 — 1) = 0:88 when т = 0 to 1 when t=1. Evidently the 
upper limit to the variance cannot be much improved, since an actual 
parent ranking has been found whose variance approximates to it 
for all values of т when n is not too small. 


More exact treatment in the non-null case 
5.24 In proving that ¢ tends to normality in the non-null case 
we have neglected terms of order 1/ут and this suggests that 
the normal approximation may hold good for large n but may be 
indifferent for small or moderate n. We will examine briefly the 
possibility of improving the approximation in the case of moderate n. 
Looking again at (5.52), 


_ 4(n — 2) Л 2 ЭТР : 
var і = ПЕЛ) var т; + EI = ТШ 123] (5.64) 


we see that the actual variance of f for large N depends on the 
unknown functions т; and v. In the absence of exact knowledge of 
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these quantities we may estimate them from the sample, using the 
sample values of с; and с instead of the unknown parent values. 
In doing so, however, it is better to modify our formulae slightly so 
as to remove bias. The reader will recall that in the ordinary theory 
of statistics it is better to use the estimator (2 — 2)" (т — 1) for 
the variance, rather than the actual sample variance Z(@ — 2)2/п 
because the average of the former over all samples is the parent 
variance. For similar reasons it is better not to substitute the sample 
values of c; and с in (5.64) but to use a formula which, averaged over 
all samples, gives the exact form of var f. Such an unbiassed 
formula is 
var t = 1 {а zd дош» 8) а 2n(n D} 
n(n — 1)(n — 2)(n — 3) n(n — 1) 
„ 7 (G3) 


where {һе с, and с values are sample values. This will give us a 
“best? estimate of var t. 


5.25 We may take matters a little further by considering the 
third moment of 1 so as to allow for departures from normality in the 
sampling distribution. Reference may be made to Daniels and 
Kendall (1947) for the details. We merely quote the result that if 


E ust) 
iust) У 


then the frequency-distribution of 


71 5 . (5.66) 


і- т 
* 


VII 


= Ey Ge Nee ~ * 
Ла) = (1 oe + 0(n71)) . (5.67) 


If £ is the normal deviate whose chance of being exceeded is P(é), 
the chance of 2 exceeding ё is 
езе 
У(2л) 
* This is a stronger result than would be obtained by the usual expansion 


of a frequency function in.a Gram-Charlier series based on the first three 


moments only. For such expansions see Kendall, Advanced Theory, Vol. I, 
6.23-6.33. 


F(é) = Р(ё) + CH —1) 
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If X is the correct limit such that F(X) = Р(2) it is readily proved 
by successive approximation that 


* e лз ae 


to order п. For example the 5 per cent value of 2 is + 1:96. The 
corresponding value of X is 
1-96)? —1 
+ 196 + 299" – у 
1586 f 0 4%, ¶½᷑ͤ ( o (507 
The corresponding 1 per cent value of X is 
+ 2˙58 + 0.9417 +» ~ (5.70) 
The following example illustrates the use of these results. 


Example 5.1 

A sample of 80 objects was graded as shown in Table 5.1 according 
to two qualities M and A. We require to assess the correlation in 
the parent population. 


TABLE 5.1 
M A M A M A 
1 5 11 17 21 21 
2 4 12 13 22 29 
3 9 13 24, 23 28 
4 3 14 14 24 19 
5 6 15 1 95 28 
6 2 16 12 26 20 
Т 15 17 10 27 7 
8 18 18 30 28 26 
9 8 19 22 29 27 
10 11 20 16 80 25 


The correlation t is found to be + 0-490. 
(a) Consider first of all the maximum confidence limits given by 


(4.13). The 5 per cent limits are 
— 0:02 < t < 0:80. 
(b) The sin? transformation of 5.20 gives 
w = 51111 = 0:512 
0:01 < t < 0:85, 
with results which are not very different from those of (а). 


TABLE 5.2 


яыкзана=аазцка‚ядеякаддявадыыз 
т | 
CCC 
БЕ а IEEE 1 +++ е | 
+++++++++ +441 4-4 е | 
КОРИЦЕ АНА It Ime ТЕ td tet E т ава 
ЖА Е 1 1 IS1 +++ 


су 


es {E4 
Е ETE Pett ETE | Do EE +++ 
МАЊИНА ве је iita 
j ee 
e т 
Ea E TIE EE ota | а 
ЕЕЕ ЕЕ EI ETT 
EERE EE +++ о 1 1111] ГІГІМ 


E D ESL TESTER Ce е ер 
шана TIS рр 
ЛАА А ТА fe a рад рана PA 
ПРВИ Ts оре завере Es ERE et 
РАЈА E obo p bee DI LE ҮШ ETI EE рана 
peser ap ame TG eee 
АИ err АП Пипи TER e р 
И ee 
F bep Depp PERSE eaa 
E АИИ TRISTIS [furore Па 
F 
IET оваа дырды E ERR E E 
Ае LEE EE РЕА р 


а а аса И E E E e 
Ма р ЕНЕН У а 4434 
АЧ. У ГРАНА LE RL 
CCC 
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(c) To go further we require the values c; and с. Table 5.2 shows 
the. matrix of values с; for the data, and we find 


с = 426 
У сї = 7470. 
From (5.65) we then have 
uoce аА fa хтато — 2X9. x4262—60 x29} 
30 X29 X28 X 27 30 x29 
= 0:006,630 


giving an estimated standard error of 0:0814. The 5 per cent 
confidence limits, assuming normality, are then | 
0-88 < t < 0:65. 
These are much narrower than the values of (а) and (0). 
(4) To allow for departures from normality we require to estimate 
Уы Which in turn depends оп t). For moderate samples the 
following formula gives an approximation : 


8 Бе % 8e 
| ust) = 512; се + ср“ — = + + a . (5.71) 


where the first term in curly brackets is a summation of (е + ср“ 
over all values i > j, i.e. the values in Table 5.2 below the diagonal. 


After some tedious computation we find 
у, = — 0:82. 
The adjusted 5 per cent limits from (5.69) are then 
0:32 < t < 0:64. 
The corrections for non-normality are small, and the limits are very 
similar to those given by (с). 
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CHAPTER 6 


THE PROBLEM ОЕ т RANKINGS 


6.1 Up to this point we have been concerned with the correlation 
between two rankings. We now consider the case when there are 
several rankings, say m in number, of т individuals and we desire to 
investigate the general relationship between them, Suppose, for 
instance, four observers rank six objects as follows : 


Object 
A B с р Е к 
Observer Р 5 а 1 6 3 2 
وو‎ Q 2 3 1 5 6 4 
35 R 4 1 6 3 2 5 
» S 4 3 2 5 1 6 


Totals of ranks: 15 11 10 19 1 


to 
~ 
A 


(6.1) 


In accordance with our known methods we can work out the rank 
correlation coefficient between each pair of observers, obtaining 


4 $ 8 Е T 
(3) = 6 coefficients. This, however, is not what we usually require. 


We need a measure of the concordance of the observers taken as 
a group. 


6.2 Perhaps the most obvious procedure is to average all the 
possible values of т or of p between pairs of observers; but this is 
evidently very tedious when m is large. In a case such as this the 
quickest method is to consider the sum of ranks allotted by the 
observers, as shown in the last row of (6.1). These numbers must 
sum to 84, and in general to 4mn(n + 1), for they are composed of 
a sum of m sets each of which is the sum of natural numbers 1 to ?- 
The mean value of the sums is then Jm(n + 1), in our present example, 
14, Consider the deviations about this mean: 


ih ch Ж aie e 
Tf all the rankings were identical the sums in (6.1) would consist of 


т, 2m,... nm 
80 


T 
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(though not necessarily in that order) and their deviations accordingly 
— ит — 1), -іта-8),... m(n — 1) 
The sum of squares of these deviations would be given by 
g3m(n? —n) . 9 5 2 (6:8) 


This is the maximum value which the sum of squares may have. 
Its extreme value at the other end of the range is zero. 

Let us then write S for the sum of squares of the actual deviations, 
in our present instance, from (6.2), 64. We define 


нај PINE S 
mn — т) 


and call W the coefficient of concordance. In our example, 


w _ 12 × 64 _ 9.999. 


16 x 210 


6.3 ТУ measures, іш a sense, the communality of judgments for 
the т observers. If they all agree W =1. If they differ very much 
among themselves the sums of ranks will be more or less equal, and 
consequently the sum of squares 5 becomes small compared with the 
maximum possible value, so that W is small As W increases from 
0 to 1 the deviations become “ more different ” and there is a greater 
measure of agreement in the rankings. 


6.4 The reader may wonder why we have chosen a coefficient 
ranging from 0 to 1 and not from — 1 to 1 as for a rank correlation 
coefficient. The answer is that when more than two observers are 
involved, agreement and disagreement are not symmetrical opposites. 
m observers may all agree but they cannot all disagree completely, 
in the sense here considered. If, of three observers P, Q, and Е, 
P disagrees with Q on а comparison and also disagrees with R, 


then Q and R must agree. 


6.5 If we write pa for the mean value of the Spearman coefficient 


between the (5 possible pairs of observers, then 


mW ү 


5 (6.5) 
Ба т-і1 
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For if the rank of the jth object by the ith Observer; measured from 
the mean }(m + 1), is % the average p is 


m n 
Vij ® 
oa d i,k=1 321 
bes mlm = 1) (n? — п) 
12 n m 2 n 2 | 
= — — ан - Tij 
m(m — 1)(n? — S24 2 3 ) 2; = 
= 12 {s MU m(n? — 2 
m(m — l)(n? — m) 12 
_ mW —1 ( Д d 3 : . (6.0) 
m —1 
When pa = +1, W =1. When W = 0, Pay = — 1/(m — 1), and 


this is the least value which the average p can take, a further illus- 
tration of the point made at the end of the last section. i 

The case we are considering is one wherein p is a more convenient 
coefficient than т. There appears to be no simple method of express- 
ing the average т in terms of the sums of ranks. 


6.6 Ik some of the rankings contain ties we may write, as in (3.6), 
T = T (ie - t) 5 (6.70 
t 


In this case we shall define the coefficient W as 


S 
W = 
m(n? — n) — т X T" 
=> 


(6.8) 


the summation Х taking place over the various rankings. 
22 


In this case (6.6) requires some modification, 


6.7 This definition requires a little comment, for the denominator 
in (6.8) is not necessarily the maximum value which the sum of 
squares of totals of ranks (measured from their mean) 
We may, in fact, define W in the untied с 
formula 


may have. 
ase by the alternative 


S 
[AE P A T қ 6.9 
ms’ (6.2) 
where 8“ is the sum of s 


quares of deviations of all ranks from their 
mean. This evidently а 


ccords with our previous definition, for the 


xs Жа 


, Айба қа ы 


8 


_ «ан ка. _ 
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sum of squares of deviations in any ranking is e — n) and there 
are m rankings. The definition (6.9) also accords with (6.8), for, 
as we have seen in 3.10, the effect of ties in a ranking providing a 
number Т” is to reduce the sum of squares of deviations by T". 
The reason for adopting (6.9) is that it bears an analogy to the 
analysis of variance.“ Suppose we array the ranks (measured from 
their mean) as 
211 12 Фа 
Фа Lag + + + Фа 
Ста m:: Umn (6.10) 
The sums of rows are all zero with corresponding zero mean. 
The sum of columns may be written бу... 8, with means 


Sim, . . Sm. је 
Then the variance of the whole array is $'/mn by definition. The 


variance of column totals is BS (S,/m)? = S/m*n. Тһе ratio of the 
n 


two is thus S/mS' = W, which is thus exhibited as the ratio of the 
variance of column totals to the whole variance. This, and still more 
the ratio S/(mS' — S), is a familiar ratio in the analysis of variance. 


Example 6.1 
Consider the three rankings 


Р 1 541 "25 Ма узу ка 6) 74 10 
Q 21 i 21 44 44 8 9 61 10 6} 
R 2 1 арта ‚ E 8 0 


Toran: 5] 6} 9 13} 12 


Deviations 
from mean 


(163) —11 -10 -Тт —8 —44 9i 
For the sum of squares of deviations S we find 591. 
For the 7’-numbers 


р: т х 202 2) =1 
Q: 12 x 3(28 — 2) = = 
R: AS — 4 + 8° 8) =7 


* See Kendall, Advanced Theory, Vol. II, Chapters 21-3. 
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Thus, from (6.8), 


ы. SE | 
742-5 — 28-5 
= 0-828, 


Тће effect of taking ties into account is evidently small. 


6.8 We now consider the testing of the significance of an 
observed value of W. If all the observers are independent in their 
judgments, then any set of rankings is just as probable as any other 
set. We shall therefore consider the distribution of W in the (n!)" 
possible sets of ranks and use it in the customary way to reject or 
accept the hypothesis that the observers have no community of 
preference. 

The actual distribution of ТУ has been worked out for lower values 
ofmandn: n = 3, т = 21010; n = 4, т = 2 to 6; n = 5, т = 8. 
These form the basis of Appendix Tables 5. For higher values we 
may use two approximations. 


(1) For all values other than those in Appendix Tables 5 
approximation may be based on the distribution know 
as Fisher's z-distribution. We write 


an 
n in statistics 


(m — 1)W 
== Јоду = шн 6 ; 5 
z = Flog, 1 — J (6.11) 


2 
ЕЕЕ. 

“| Boer s 25 45200 
va = (m — 1)» 


zr = тп —1)W = — ИМ t « (6.18) 
ysmn(n + 1) 
then 27 is distributed in the form 


known in statistics as х? with 
v =n — 1 degrees of freedom. 
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Example 6.2 
For 18 rankings of 7 a value of S is found of 1,620 so that 
pup die 
<= x 18? x 886 


From Appendix Table 6 we see that for the 5 per cent level 


m = 15 S = 864-9 5 
m = 20 S = 1158-7 
and for the 1 per cent level 
m = 15 8 = 1129:5 
т = 20 S = 15219. 


For m = 18 the appropriate values of S lie between the values for 
m = 15, m = 20, and our observed value is greater than the value 
for 1 per cent. This means that the probability of obtaining a value 
as great as or greater than the observed value is less than 0:01— 
the value lies, we may say, beyond the 1 per cent point. It is thus 
significant if we agree that such small probabilities are significant. 


Example 6.3 


In 28 rankings of 18 a value of S was found of 11,440 and hence 
W=- шш = 0-080. 
A. x 28? x 2184 


13 


We may test this by the use of (6.13). We have 
z = à | 11440 E X m 
dg x 28 X 18 x l4 
ix Table 8 we sce that for v =n — 1 = 12 degrees of 
ce level, y? = 26:217. Our 


this and is thus “ just 


From Append 
freedom, at the 1 per cent significan 
observed value is slightly greater than 
significant " at the 1 per cent point. 


Example 6.4 

In practical cases the number m is often so large that no correction 
for continuity need be made, but if one is desirable, it may be intro- 
duced by adding 2 to the denominator and subtracting unity from 


the numerator in 

W = — 
127m ( — 97) 
consider the case n = 3, т = 9, and suppose S = 78. 


For instance, an 
e БА we see that the probability of such a value 


From Appendix Tabl 
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or greater is 0-010, so that this is approximately the 1 per cent point. 


Suppose we apply the z-test to these data. We find, with continuity 
corrections, 2 


[iit = айы ee 0-4695 
12 (81 x 24) + 2 
Созо, К, 2160, 5,2195 
9 9 


By linear interpolation of reciprocals in Appendix Table тв we find, 
for these values of v, and v, a value of z equal to 0-954 against the 
exact value of 0:979. Even for such low values of m as 3 the 
approximation given by the z-test is fair. 


6.9 When ties are present the z 


-test requires no modification 
unless the number or extent of the 


ties is large. 
In the latter case the test becomes more complicated. Let р; 


be the variance of the ith ranking typified by att -1)- Іт, 
Write 
2 Mai Hag 
— ___ с Bl 
тп — 1) (X uz)? (6.14) 
the summation extending over the im( 


may be tested with z given by (6.11) 
freedom 


BW) — À 


m — I) values 1 2. Then W 
and the modified degrees of 
n 2(m — 1) 2 
m СР) m : f · (6.15) 
2 = (m — Тју; 
The appropriate value of 17 is 
ln B 
1277 +1) — s 


i 27% NI (6:16) 
27" 
1 


Estimation 

қ 6.10 Suppose now that a value of W has been found to be 
significant, so that there is evidence of some agreement among the 
observers. If we go further and Suppose that their judgments are 
What is the true ranking of 


1 the objects, 
estimate we can make of + 


hat true ranking ? 
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6.11 Suppose we һауе three rankings of eight as follows— 


Object 
A B с D E F G H 
Observer P 4 2 1 7 6 3 5 8 
i Q т 2 1 % А” 05 MENS MN 
m R y 4 2 6 5 3 1 8 
TOTALS : 18 8 а 10 1 9 24 


(6.17) 


One procedure which we must dismiss is that of ranking according 
to the number of “firsts”, “seconds”, etc., obtained by each 
individual. For instance we might rank C first because it has two 
“firsts '. Object G has the remaining“ first +’ and we might rank 
it second. Looking then to the“ seconds“ we find: that B has two 
so we rank it third. The other second occurred under C, which has 
already been ranked, so we proceed to the “ thirds ”, and so on. 
The ranking obtained in this way is 

C G B F EA D H в) 


When we consider the “ fourths ”, there are two members, Е and А, 
having one each, but we give precedence to the former because it has 
a fifth whereas the latter has only two “ sevenths 9%; 

This procedure is not self-consistent. Suppose we start from the 
other end of the ranking and rank as 8 the individual with the greatest 
number of “ eighths ” and so on. Then in our present example we get 

CBGFEDAH . . (619 
which is not the same as before. Іп general there is no particular 
at one end rather than at the other, and it is 


reason for starting 
the two procedures should give different 


evidently unsatisfactory that 
results. 


6.12 A better procedure is to rank according to the sums of 
ranks allotted to the individuals. Thus, in (6.17) C has the least 
total, so we rank it first, B has next lowest, and so on. The ranking 
thus obtained is 

G вс E B, A DIH d . (6.20) 
which, it may be remarked, is different from either (6.18) or (6.19). 

It may be shown (as in the next chapter) that this gives a“ best 
estimate in а certain sense associated with least squares. In fact, 
the sum of squares of differences between what the totals are and 
what they would be if all rankings were alike is а minimum when the 
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ranking is estimated by this method. Furthermore, if the ranking 
arrived at by this method is correlated by Spearman’s p with the 
observed rankings, the mean p so obtained is larger than for any other 
estimated ranking. This is not necessarily true for т also but will 
usually be so for rankings of moderate size. 

6.13 A few points require mention in connection with ties. 

(a) If no observed ranking contains ties and we do not wish to 
admit them in the estimated ranking, occasional ambiguities may 
arise. 


Suppose that three particular objects are ranked as follows : 


x Y 2 

Оһзегуег Р m 8 10 
3% Q 9 8 6 

" R 8402 6 

5 8 5 1 2 
TOTALS : 24 24 24 


(6.21) 


The totals being the same, our method gives no criterion of choice. 
If ties are permitted we should rank them all alike. If not, then it 
seems best to give precedence to that object for which the ranks 
cluster most closely. Since the totals are the same, this is equivalent 
to choosing first the object for which the sum of squares of ranks is 


r X, У, Z are 164, 178, 176, so 


cancels preferences between the tied 


tinues to rank according to preferences even when ties exist. 


a class of 30 boys 
and suppose 6 female teachers 
rank the same class giving W = 027, n 


1 This shows that the women 
have-a greater community of preference than the men; but if we 


from male teachers in general (and so for 


|| 
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women) we cannot apply a test of significance to the hypothesis that 
women teachers are more in agreement among themselves than men. 
TThis is a gap in our knowledge which it would be useful to fill. 


6.15 Тһе provision of the above tests of significance should not 
be allowed to obscure the desirability of examining the primary data 
to see if there are any obvious effects present. When a number of 
observers are suspected a priori to be heterogeneous in their tastes, 
it may obscure meaningful effects to assemble their rankings into 
а single group. To take the extreme case, suppose ten observers are 
in complete agreement about a ranking of six, so that the sums of 
ranks are 

10, 20, 30, 40, 50, 60 
with W — 1. Suppose that 10 further observers, also in complete 
agreement, rank the objects in the inverse order 

60, 50, 40, 80, 20, 10 
also with ЈУ — 1. The sum of the rankings for the 20 observers is, 
of course, 70 for each object with W = 0. For the whole group we 
might conclude that there was no community of preference, whereas 
in reality the community of one set of observers has completely 


masked that of the other. 
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CHAPTER 7 
PROOF OF THE RESULTS OF CHAPTER 6 


7.1 We shall first establish the validity of the z 


-distribution as 
providing a test of the concordance coefficient 


W in the population 


of (n!)™ possible rankings. А proof for the case of untied rankings * 


is given in Kendall's Advanced Theory, Vol. I, 16.33, and will not 
be repeated here. As we shall require general results for tied rank- 


ings, we give a somewhat more general investigation, due to Pitman 
(1938). 


7.2 Suppose we have m sets of numbers 


а а а 
(Б UR Bs b 


n 


n 


Hae Kesar is wks À (7.1) 
We will suppose that each is measured about the mean of the row 


in which it oceurs, so that the means of rows and the mean of the 
whole are zero. We then have 


1 n 
eC Bt Oy Suse + hy)? 
wW : 


n 2 n 2 n á (7.2) 
FERRE 427% + ZR; 
1 1 


Тће denominator in the expression is a constant and the variability 
arises solely from the numerator. If * represents the second moment 
of the'a-row and so on for b, с, ete., the denominator is т; Writing 


Ха». 
n 

Ry = Z a;b; 
1 


(7.3) 
(2) 
U= ER, (7.4) 
where N, stands generally for any Ray, we have 
i 2U 
E 
т mm La,. 5 кыо) 
Ше 08 first the moments of Ray, then those of U and finally those 
of W. 
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7.3 We have, writing E for expected values, 
E(R4) = 0 
E(R$)-— E(Xajb)* 
= E(Z ab? + X' a;b; ауу) 
= nE(a202) + n(n — 1)E(a;a; bib;) 
= nE(a3)E(0?) + oe (E a) — E E af} 
22(n— 1)? а 
x {E(Z b)? — E Xi) 
амал 22.10 о) 


т 
= поља + «ова 
n —1 


т--1 
In a similar way—we omit the algebraical details—we find 
ER?) = 8053 | (n — Yn = 2) g. 
(Ва) (n — 10е — 2) р О · (77) 
4 62 82 
ERS) = 3n* d Bo (n — (n — 2)(n — 3) gr , 7.8 
(Ra) (n —1)(% + 1) n(n + 1) саћа (7.8) 


where we write аз for the third moment of the a-array and os, 94 
for its third and fourth k-statistics which are defined in terms of the 
moments by 
. 
(n — 1)(n — 2) 


Й п 
од = woe e — ву” + 1) — 8(% — 102) 


The point of using Vestatisties is that for normal populations they 
vanish for degree higher than 2 and may therefore be presumed 
small for populations reasonably near to normality. 


7.4 То find the moments of U we note that 
E(R;; Ёш) = 0 for all suffixes except û = k, ј=! 
E(R;; Ку Rmn) = 0 unless the suffixes form a “ circular ” set 
such as ij, jk, ki. 
Similarly with four terms the only type not vanishing is one in which ° 
the suffixes are ij, jk, kl, li. Hence 


E(U) = Е(Х Е) =0. 5 s . (7.9) 
Ee) = E(Z R} + 2' ER) 
2 BH ALS. (691) 


n — 1 
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Further 
E(R;; Rir R, = E(Z a;b; Z bic; X (277) 
= El {2 Бас; + 5' бас, + die,] 2 cja] 
= ELE b; Хас, + Ebjb; X (ас, + ajc;)] X са; 
= (Eb? — Ebibh) (Хас) 


- (2 + 1 10 1 


n —1/n — 1 
N 2 2 
(n — 1): 


Z E(R}) = 0 20) Za 


Hence 
3 


6n 
(n — 1)? 


E(U*) = Хау. + e D сз) Ze (7.11) 


Finally—again we omit the algebra-— 


3ni 2 (n —1)(n — 2)(n — 8) 
E(U4) = 20252 
(U4) ( — 1n + 1) «28 n(n 4-1) Zo, 
3n 2 
пи (а — ig E саћа)“ ES Z ofi) 
tof 4 
+ 12(n — 2) Z ову, ЗЕ n 2c 2 20 2 . (7.12) 
(n — 1)5 
Finally, for the moments of W, 
1 a 
EW) = — — ў, say . ‹ · (7.18) 
m 
EW P): Е. 4 5 оз), 


m) =I) (api co (719) 


and, neglecting terms involving о» and од which are of lower order in n, 


E(W—Wp-. 488 Lag 
о o DN C 
EW — јр): — 48 (Zo 96 У 282 


та т — 1)? (Z a)i m*(n — 1){n + 1) (Cg 


1152 Zaf oy 5. 
mn — туз ape Б (7.16) 
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7.5 Consider now the case when the ranks are untied. In this 
case all the variances of rankings are equal and we find 


EW) == TE ИНЕ d uu 


ш) = Ey — )° = n „„ а МИНИ Та) 
Wh) =< 20 pim Dm < 
uy WW) = EW — W) ита =a) (7.19) 

90% Pee. 12(m — 1)? , 48(m — 1)(m — 2)(m — 8) 
FU) И) вту та — 1)» 


_ __486%—1) 20 
m(n — 1)%( + 1) 
Now consider the distribution 


дв 21 jm —wy3aw. . (7.21) 


Bip, 4 
The first two moments are 
питу =, MM (7.22) 
pag 4 
na W) E CADIS ћ . (7.28) 


“(рә +9+10) 
If we equate (7.17) to (7.22) and (7.18) to (7.23) we find 


1 
zo. М 2 3 
q = (m —1)p 


The distribution of ЈУ is thus an approximation to (7.21) when p 
and q have these values. The third moment of (7.21) is 
_ 8(т — 1)(т — 2) | 8(m — 1)(т — ali 2 2 ) 
m*(n — 1) % — m + 2) m*(n — 1)* m(n — 1) + 2. 
Comparing this with (7.19) we see that the third moment of (7.21) 
is approximately equal to the third moment of W unless m(n — 1) 
is small. Again the fourth moment of (7.21) is 
12(m — 1)[(m — 1)у? + 4m*y — 14(m — 24 
m*(n — ae (y + 2)(у +4) 
where у = m(n — 1); and this is approximately equal to the fourth 


moment of W if m(n — 1) is not too small. 
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7.6 This distribution (7.21) has its first two moments exactly, 
and its third and fourth moments approximately equal to the actual 
distribution of W ; and thus we expect it to provide án approxima- 
tion. The accuracy of the approximation is, in fact, greater than 
perhaps our rather long proof of the result might foreshadow. 

The distribution (7.21) is known in statistical theory as the 
B- (Beta) or Type I distribution. A simple transformation reduces 
it to the form of Fisher’s z-distribution. In fact, putting 


—1)W 
в = 110 (m — J) N. 
A үр 
we find that (7.21) reduces to 
е??? dz 


((m — 1) Te 
which is Fisher’s form with 


эі = 2p 
y, = 27 
so that, from (7.24), 
у, — (n — 1) B 


Ren va = (m = 1), 
as given in (6.12). 


7.7 If ties are present the test needs further consideration. 
(a) In the above derivation we have only used the absence of 
ties to evaluate the terms %» and if all rankings are equally tied 
these variances are still equal and the results hold. 
3 (b) If the 7’-numbers appropriate to ties are small compared 
with 7 (n* — n) = N, say, again, the test requires no modification. 
For then, to the first order in ZT'/mN, 


Zapa _ 2(М — TN 7% 


(Za)? X(N — T 
(m — о" 2 2 =i 
Er T= ST po " 
m? mN г} тїї хт 
RURAL 
m 


so that, to this order, the second momen i 
t of W 
The effect on the third and fourt ЖЕККЕН x 


h moments i: igi i 
order. Hence our result. У iso negligible to this 


PROOF OF RESULTS OF CHAPTER 6 95 


(c) If the 7’-numbers are large then we must calculate 20H). 
We shall then find 


„ E Y 

таи!) mp. . * . (7.25) 
Wa = 9 =(m—1)p 

and the test may be applied with these values of 7, and vs. 


7.8 We now prove that the statistic 
72 = m(n — 1)W 
кш о (59 
ттт + 1) 
tends as m increases to that of the z?-distribution 
aF ce My dgy . : . (7.27) 
with » =n — 1. 
In the array (7.1) consider the sum of any column, say the first, 
which we will call p. We have 


E(p) = Ela, + bi + - · • +0) =0 


E(p?) = Ха) = Хаг 
E(p?r!) is of order Бара + © toii Ba + ete. E(p?") is of order 
Го, +... + 2 ова + + + Хе The same argument аз we еш- 


ployed in 5.21 leads to the conclusion that Eper) is of lower 
order in m and that the dominant term in Elp”) is 2 сова . + Ба 
Thus in the limit ода moments vanish and 
И Qn POTERE Е . (7.28) 
(а £T o Qe 
When all 025 are equal, or nearly so, 
Har _ (27)! 
(ау 11 27 
and hence the distribution of p tends to normality with zero mean. 
Now S is the sum of squares of n such variates, subject only to 
the constraint that X(p) = 0. Thus ES is distributed as g? with 
» =n — 1, where Ё is a factor to be determined such that the mean 
of kS is n — 1, the mean of 2". But 
E(kS) = kn 2 «a 


and thus 
n—1 


n Cd 


k 
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and hente 


„э _(n—1)8 
1 n C 


is distributed as 22. 


When the rankings contain no ties each has the variance (те — 1) 
and thus 


72 5 


ur NU + 1) 


as given in (7.26). 


If ties are present, represented by “numbers, the appropriate 
value is 
т — 1)8 
rm ( ) 7 
ТЕТЕ —1)—- т) 
12 т 
S 


1 т . (7.29) 
= i) ua 
qanm Ш n —1 


Unless the ties are substantial in extent the effect of the second 
term in the denominator is small. 


7.9 We now indic 
tributions of W (or equi 
For m 


ate the basis of caleula 
valently of S) 
— 2 the values are deri 
Spearman's P. We proceed from t 
form ＋ 1, n. For example, 
ing values for the sums of 


ting the actual dis- 
for the lower values of m and n. 
vable from the distribution of 
he case for given m, m to that 
with m = 2, n = 8, we have the follow- 
ranks measured about their mean: 


Туре Frequency 
2 0 2 1 
— 2 1 1 2 
— 1 0 1 2 
0 0 0 1 


Here — 2, EI T; 
for they give the same value 
when we proceed to the case 

For m =3 each of the ab 
permutations of — 150,21: 
each of — 8, 0,3 ; 8,1, 


— 1 are taken to be 
of S and will also 
m = 8 as follows. 
Ove types will appear added to six 
6.8. the type — 9, 0, 2 will give one 
2; —9, —1,8; == 2, ПИС Ey c quo 


identical types, 
give similar types 
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and — 1, 0, 1. These types are counted for each of the basic types 
of m — 2, and we get— 


Type Frequency 

—8 0 3 1 
— 3 1 2 6 
— 2 0 2 6 
— 2 1 1 6 
— 1 0 1 15 
0 0 0 2 
36 


For n = 5 and greater the labour becomes very considerable owing 
to the large number of different types to be taken into account at 
each stage. It seems, however, that for all ordinary purposes in 
testing significance the z-distribution provides an adequate approxi- 
mation for greater values of m. 


7.10 Finally we have to show (as indicated in 6.12) that the 
method of estimation there proposed is such as to minimise the 
average p-correlation between the estimated and the observed 
rankings. i 

Suppose the estimated ranking is X, . . . X, and let the sums 
of ranks be S, . . . Sn Then the average p is given by 


m 12 m n З 
= 22 X. — 1n + 1) ) ( Hn +1 
mou? ا‎ (X, — An + ја — Қа + YI 

where aj, is the rank of the jth object in the th ranking. This is 
equal to 
12 n T 
— X, — Hn ＋ 1) (5; — m + 1)} 
m(n? — 24 %-% 1 


Lt pilo 4 n "ey hen Кы 10 
me 32; eo ітт(п + 1) || (7.30) 


This is clearly a maximum when 2 (XS) is a maximum, i.e. when 
the greatest & is multiplied by the greatest X and so on, the least S 
being multiplied by the least X. Our suggested rule of estimation 
does in fact ensure that the multiplications take place in this way, 


' and hence the result follows. 
If we consider the sum 


n 
E 
U = (6; = m) 
2, 4 


= zs + та XX? — 2m X(XS) 
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we see that, since the first two terms on the right are constants, 
U is minimised if X (XS) is maximised. Our method of estimation 
therefore minimises U, that is to say, minimises the sum of squares 
of differences between the actual sums § and what they would be, 
mX, if all rankings were identical. 
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CHAPTER 8 
PARTIAL RANK CORRELATION 


н 8.1 In interpreting an observed dependence between two quali- 
ties we are constantly faced with the question whether an association 
or correlation of A with B is really due to the associations or correla- 
tions of each with a third quality C. In the theory of statistics this 
kind of problem leads to the theories of partial association or correla- 
tion which attempt to decide the matter by the consideration of 
sub-populations in which the variation of C is eliminated.. The 
same problem arises in rank correlation. For instance, if there 
appears a significant correlation between mathematical and musical 
abilities in a number of subjects, the question arises whether this 
may be attributable to the correlation of each with some more funda- 
mental quality such as intelligence. We proceed to consider a method 
which may be applied in rank correlation theory to an investigation 
of this kind of problem. 


8.2 Suppose we have three rankings of 6 as follows 


РЕП, 2° 59: ay 5 MENO 
Qus итен 726 (6 JE A . (8.1) 
RA 93 ЭКС бу. 9 


Тһете ате $ possible pairs. Taking some ranking as standard (it 


e will take the one in which the 
s write down all possible pairs 
ved has that order and 


does not matter which one, so W 
ranking is in the natural order), let п 
and enter + underneath if the pair obser 
— in the opposite case. We find: 


(12) (18) (14) (15) (16) (29) (24) (25) (26) (84) (35) (86) (45) (46) (56) 


» 3-1 Лы о too жі и 
/// ное 
L La Ке А а 
For the coefficients we find : 
Jp. “бас, oR E 


TPQ — 15? 
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Consider now the following four-fold table, setting out the 
agreements of rankings Q and R with P: 


Ranking R 
Pairs + Pairs — 
(agreeing (disagreeing | Torars 
with P) with P) 
> Pairs + 
80 (agreeing 6 5 11 
E with P) 
М 
8 
в Pairs — 
(disagreeing 8 1 4 
with P) 
Torars. . 9 6 15 


(8.2) 

Here, for example, there ar 

in 6 of these R also арт 
Generally in three 


€ 11 cases in which Q agrees with Р; 
ees with P, and in the remaining 5 it disagrees. 
rankings of n we shall have a table of the form 


a b a+b 
с а c+d 
ate|b+d N «(2) =a + +e +d 
(8.8) 
We now define a partial rank correlation Coefficient of Q and R 
with P as 
0 ad — be (8.4) 
У (e + oe + аа + J a) ` 6 
In our present example this becomes | 
6—15 220186 
VIX 4 х 9 x 6) Е 


as compared with тор = — 0:067. 
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8.3 Тһе coefficient of (8.4) is a coefficient of association in a 
2 x 2 table and we have already met it in another connection in 
3.14. It can vary from —1 to +1 but not outside those limits, 
and measures the intensity of association between the agreements 
of Q with P and those of E with P. - 
ТЕ the coefficient is unity we have 


(ad — be)? = (a + 0)(а + c)(b + d)(c + d) 
giving 
абай + abc + bd + са) + be + ad + cd) 
+ cXab + ad + bd) + ас + ab + bc) = 0. 

Since no a, b, c, d can be negative this can only be true if at least 
two of them are zero. If two in the same row and column are zero 
we get the purely nugatory case in which either Q or R is in perfect 
agreement or disagreement with P. We have then to consider only 
à — 0 and d — 0 or b — 0 and c — 0. In the former case Q and Ё 
agree completely upon their concordances with В and товр = 1. 
In the second case they disagree completely and the coefficient is — 1. 

8.4 The reader who is acquainted with the quantity known as 
x3 can easily satisfy himself that 


2 
в» INT 21-2 “1 (80) 


Ja, and hence т, measures the degree of departure from the case 
when the dichotomy of preferences according to Q is independent 
fact, that they are independent. 


of those according to R. Suppose, in 
Then the frequencies in the table will be 


(а + b)(a +e) (a+ by(b + d) 
N 


N 
(а + 9+0 (+00 - 
N N 


The differences between the observed values and these “ indepen- 


dence ” values will then be typified by 
(а + b)(a + с) „аад кои 
N N 


_ бе — ad 


N 
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Thus 7? is the sum of four terms like 
(be — ad)? Да + Ma + о) 
N? N 
and the sum reduces to 
с (bc — ad)*(a +b +e-+d) 
(а + b)(a + сје + d)(b + d) 
from which (8.5) follows. 


8.5 We have therefore constructed a coefficient, capable of 
varying from —1 to +1, which measures the extent to which @ 
and R agree so far as concerns their agreement with P. If the coeffi- 
cient is + 1 they are in complete agreement ; if it is zero ad — be =0 
and a/b =c/d, so that the preferences are independent; if the 
coefficient is — 1 they are in complete disagreement. 

We may then say that partial т as so defined measures the agree- 
ment between Q and R independently of the influence of P. Partial 
т is increased by an agreement between Q and R whether they agree 


with P or not. Тһе point may be clearer from a further examination 


of the table (8.8). For the ordinary rank correlation between Q and 
R we shall have 


ток = (FA — (+o) | (8.6) 
QR 5 


In the table, however, we itemise t 
according to whether they do or 


containing а, b shows us how far Q has + or — scores in those 
items for which R 


has only + scores, If this row is similar (in 
the sense of proportionality) to the row containing с, а, then Q has 
+ or — scores in much the same proporti 
or not. In such circumstances we can 
much in agreement, except in so far 
Our coefficient of partial т те 
towards greater differences о 
i.e. gives a better indication th 
between themselves, whatey 


, 
at Q and R are more or less in agreement 


8.6 In addition to (8.6) we have У 

ОЦИ 3 
N 

en 54-04 Жы OCS 


Тро = 
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Remembering that N =a +b +c +d, we have 
„(а + Dye +a) 


1 ä — 
Ton — tro ten = Villa +b +e +@{а cd —5 +o) 
- {a +b —-c + d}{a +¢ —0 + 4)] 


4 


үз bo) 


Thus from (8.4), 
сесі Tor — ТРЕТОР 
e r ee 
This expresses partial т in terms of the coefficients t between the 
original rankings. It is remarkable (but apparently is only a coin- 
cidence) that this relationship is formally the same as that expressing 
a partial product-moment correlation in terms of the constituent 


correlations.* А 


Example 8.1 
Three rankings are given according to (1) intelligence, (2) mathe- 

matical ability, (8) musical ability. They are as follows : 

a 7 8..9 00, 


(1) 1% 2,931899 ДЕ 

(2) Y ТАСС I 10 

(8) 4 qu 8*5 он о0о 9 8 
We find 


211 0.644, 10088. Та” 0:556. 


Thus, from (8.9), 
0:556 — (0-644)? 


| газа = 7, — (0-644)? 
= 0-24. 


This correlation is weaker than between (2) and (3) above, and we 
and (2), (1) and (3) may be 


suspect that correlations between (1) 

| - masking the real relationship between (2) and (3). This kind of 
inference, however, must be made with considerable reserve. It is 
a suggestion for further inquiry, unless there are prior 
grounds for expecting the effect. 


Yule and Ken 


nothing more 


dall, Introduction, 14.15. 


| * See, for instance, 
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8.7 No tests of significance are yet known for partial т. The 
so-called y?-test cannot be used because there are dependencies 
between certain scores entering into the quantities а, b, c, d; for 

instance, if 4 is ranked before B and B before C then 4 must be 
ranked before C. Here again we have a branch of the subject which 
might repay further investigation. 


References 


See Kendall (1942a). In his 1948 paper Hóffd 
plicated expression for the variance of partial z 
large. If т), = та = 0 the distribution of Ут 
the limit, as that of Vt — та). 


ing gives a rather com- 
in the limiting case when n is 
“(із — Тіға) is the same, in 


CHAPTER 9 


RELATIONSHIP OF RANK AND NORMAL 
CORRELATIONS | 


art from the attitude previously 


9.1 In this chapter we dep: 
ations in their own right, irrespec- 


taken up of considering rank corrcl 
tive of the nature of the parent population. We shall suppose that 


the parent is, in fact, normally distributed with correlation p^ 
that is to say, that there underly the ranking process measurable 
continuous variates which are normal for each variate and normally 
correlated. We may then ask ~ 

(a) What is the relationship, if any, between the rank correlation 

of a sample and the parent correlation p’ ? 
(b) Can we use a sample rank correlation to estimate p'? 
(c) If so, what are the standard errors of the resulting estimates ? 


9.2 We must first clear the ground by noting certain properties 


of a continuous population. Such а population cannot, in fact, 
possess a rank correlation ; for the essence of ranking is that the 
objects shall be orderable, and the totality of values of a continuous 
variate cannot be ordered in this sense. They can be regarded as 
constituting a range of values, but between any two different values 
there is always another value, so that we cannot number them as 


would be required for ranking purposes. 


Relation between f and parent correlation in the normal case 

9.3 Nevertheless, a sample of values from a continuous popula- 
tion ean be ranked and a coefficient caleulated from the data. 
Suppose we find a value t of the c-coefficient in such а sample. 
What is the relation between £ and р’? The answer is that if we 
take an average of і over all possible samples, then, denoting this 
average by E, 


2 e iUm 
E(t) ao lp. 5 5 . (91) 


2 = 0:707, E(t) = 05. 


For instance, if p' is 17 
105 
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We may therefore argue that if we put 
5 2 5 - (92) 
E 2 


then 7” is a reasonable estimate of the parent р'. It does not follow 
that r' is an unbiassed estimator of p’ in the sense that 


E) = (sin!) = y bes 
although 


E(t) = 2 sin 1 .م‎ 
л 


Equation (9.3) has not been shown to be true, 
true in the limit for large samples. 
Nevertheless, the fact that the inverted form 
it plausible to select r’ as an est 
to be due to Greiner (1909) 
to Esscher (1924) 


though it may be 


(9.1) is true makes 
imator of р. The relation appears 
and the variance which we quote below 


9.4 We shall show in the 


next chapter that for samples from 
a normal population with corr 


elation p' 


, 2 
VE ds ER һ - (5 sin-! P) 
n(n — 1) л 


теа). а 


We know that t is normall: 
may use this result to t 
reference to the normal i 
some values for the пп 
practice in the theor 
(9.4), 7' itself being 

Now if P, Q are 
to t, and if we write 


y distributed for large samples and hence 


ntegral. But to do so we have to assume 
known p'. In accordance with the usual 
y of large samples we shall replace p' by r' in 
given by (9.2). 

the positive and negativ 


‘ 


€ scores contributing 


uc EY 
д n(n — 1) q n(n — 1) ` 5 ға 108), 


so that p +q = 1, we have 


and hence 1 — t = 42 , ы z - (9.6) 


est the significance of an observed і by. 
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Since Зал == 
л 


we then һауе, from (9.4), for large samples, 


var t = p^ x30 = ЈЕ Б (5 sin 0 0] . )9.7( 


It may also be shown that 
1 72 2 4 
0 <= Жыш) <= 
5 Е біп”! $7 ) 555 4 . (9.8) 
and for т > 10 
2 2 5 
_# fr f 2 K 1 А Қ 
nn — 5l 90 ) 9(n — 1) (99) 
Substituting in (9.7) we find 
20pq 
= (9.10) 
_ 51-0) А i . (9.11) 
9(n — 1) 
attained in a number of cases, 
ate of the actual variance. 


vart < 


The upper limit is, in fact, nearly 
so that (9.11) gives us a fairly good estim. 


ation with which we are here 


9.5 То the degree of approxim: 
ples), we may compare 


concerned ( being written for 7 in large sam 
this with (4.11) written in the form 
eee 

т ' 
Apart from the difference in the factors n and %- 1, which is not 
important for large samples, we see on comparing (9.11) with (9.12) 
that the former gives a limit which is only 0:278 times that of the 
latter, the standard error being accordingly only 0:53 or little more 
than half as great. This is the gain in accuracy which we acquire 
at the expense of assuming that the population is normal. 


vart < 


9.6 Since 


* . mi 
т' sin — 
2 


we have, for small variations, 
at 
cos — ôt 


or == 
2 2 


108 RANK CORRELATION METHODS 


Squaring and summing for all such variations, we find 
var 7! = En — 2) уагі. 5 5 5 2 З 2 (9.13) 
4 


Е за дү» »[ ( inti v) ‚ (9.14) 
n(n — 1) PE 9 л 
By the use of (9.11) we have 


2 — 
vart < ла — n> 10. . (9.15) 
n —1 

OE 6 
< (2-34)? p - қ 5 . (9.16) 

< (2:34)? pq SET 
If we use 7' to estimate р', (9.16) provides an estimate of an upper 
limit to the standard error of the estimate. ЈЕ is interesting to 
compare this with the standard error of the product-moment sample 
correlation, say r”, viz.— 


— „% 
vart” = (1 — riy (9.17) 
n 
Taking the upper limit in (9.16) and ignoring the difference between 


т — 1 and n we have 


1 ГЕ 2:34 / D) 
хатт! Мита 
If p' = 0, i" is approximately zero for large samples and p is approxi- 
mately equal to q, so the ratio of standard errors given by (9.18) 
is approximately 1-17, ТЕ Р = 0-9 we may put approximately 


1 = 2зш-1 0-9 = 0-718, 
л 


(9.18) 


pq = Қ1 — 1?) = 0-123. 


The ratio of standard errors then becomes approximately 1:88. 
Evidently the -product-moment estimate r” is more accurate for p' 
than is 7’, in the sense that it has a smaller standard error and 
therefore is more likely to be nearer the true value р’, 


Relation between e and parent correlation in the normal case 

9.7 We now consider the simil. 
p' to the sample value of Spearma 
The result which we shall 
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as the proportional frequency of members with variate-values less 
than or equal to the value of the individual. If a population is 
discontinuous we may still define a grade, but conventionally we 
regard the individual as divided into halves, one allocated to the 
lower and one to the upper part of the range. Thus, if a member 
is ranked as j we shall take its grade as 


Cie eee st о (шу 
т т 


Although we cannot define a rank correlation for à continuous 
population, we can define a grade correlation simply as the product- 
moment correlation of the grades; and this reduces to the rank 
correlation in a finite sample in virtue of (9.19). In any population 
each value of the grade from 0 to 1 arises equally frequently, and 
consequently its mean is }. Further, its variance is Iz. For the 
grade correlation between grades E, N, say pg measured from their 
mean (corresponding to variate-values а, у), we then have 


"EM БЕСІГІ NULL O 


y. This formula merely expresses 


where f is the frequency of a, 
duct En over all possible values to 


that we have summed the pro 
arrive at the grade correlation. 

It may be shown, as in the next chapter, 
correlation 


that if p' is the parent 


Thy co 24 2209) 
6 


and thus the parent p' can be obtained from the parent grade 


correlation. | 
Since grade correlation reduces to the Spearman rank correlation 


in a sample it seems reasonable to take the sample Spearman rank 
correlation r as an estimate of pg: We may then take as our estimate 
of p', say 7, 

„22 sin z (9.22) 


by analogy with (9.21). | 
The reader must at this point take a firm grip on а rather con- 
fusing notation. In the normal parent population we have а GP 
tion p' and also a grade correlation pg related to it by dw i 
is no parent value of the Spearman correlation p. In the 1 

we have a product-moment correlation (which we denote by 7"), 
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a Spearman rank correlation r which may be regarded as the sample 
value of p,, and an estimate of р” denoted by 7’, related to т by 
(9.22) or to t by (9.2), according to which method we are using. 


9.8 In arriving at (9.22) we have evidently made some assump- 
tions which need to be remembered when we are interpreting results. 
Comparing with (9.2), we see that the ratio of the two estimates 


of 7’ is 
„ лі 
sin — 
4 7 (from t) 2 
7 (from 7) 2 sin 2 


Now for t and r not too close to unity (and in any case between 


— 1 and + 1) sin 4at is fairly close to ілі, and sin 6 is quite close 


to T The ratio is therefore approximately 
3t 
2r 
Remembering the relationship of 1.20, we see that this ratio is 
approximately unity. Our v; 


arious approximations, however rough, 
are at least consistent. 


9.9 For large samples we have 


and thus 


2 
varr = а — ir?) varr 


Now we know that in the null ease when P =0, var —1 /(n — 1). 
Hence, in that case, 


2 
Var ff = SEO. | 
9(n — 1) | 


RELATIONSHIP OF RANK AND NORMAL CORRELATIONS 111 


An exact expression when р' ~0 is not known. A formula derived 
"Ђу К. Pearson (1907) after some heroic analysis is 


уаг?' = Eu — y3)* (1 + 0-083,44937? + 0-017,14987"* 
0.004, 2797 “% + 0.000, 45597 + . · .) . (9.28) 
whence 
var” = 1-047(1 — cı 4 0:0427? + 0:008r'* + 0:0027") 
p . (9:94) 


If we compare the standard error of т" given by (9.24) with үн 
given by (9.17) we find а ratio of approximately 1-047(1 + 0-0427"*). 


Thus when p’ = 0 the standard error of 7” estimated from Spearman’s 


coefficient is about 4-7 per cent greater than an estimate from the 


product-moment, not a large excess. When p' > 0:9 the former is 
about 10 per cent greater. 


9.10 The above methods are mainly of use in estimating a 
putative parent correlation from observed rank cone ш the 
assumption that the variates are normally distributed. _ Unless we 
are given variate-values instead of ranks In the sample it is impos- 
sible to test from this evidence whether the hypothesis of normality 
is legitimate or not. The methods are therefore applicable only if 
we have collateral evidence in favour of normality. In the contrary 
case it is better to work direct with the rank correlations. 
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CHAPTER 10 
PROOF OF THE RESULTS OF CHAPTER 9 


10.1 Let sgn ê stand for +1 if £ is positive, zero if Е is zero 
and — lif£is negative. We shall require the result that for real & 
1 | её di 
* Le it 
= E >0 : . (10.1) 
=0, £—0 
F. 
The integral 15 to be understood as а principal value, that is to say 


ie 


Equation (10.1) is equivalent to the real integral 


sgn Ё = 


тё = 2 „„ 
лЈ-— t 


From the definition as 
the integral vanishes in 
Perhaps the quickest wa 
complex integral 


а principal value it is clear that if £ = 0 
virtue of the symmetry of the integrand. 
Y of establishing (10.1) is to consider the 


| еш аз 


іш 


If & is positive we take this round the contour consisting of the 
real axis from — R to — 5, the small semicircle of radius г above 
the axis, the real axis fr 


om e to R and the large semicircle of radius 
R above the axis. This integral vanishes, for the integrand has no 
poles inside the contour. The integral alo 


ng the real axis tends to 
| о е 


it 


The integral round the larg 


е semicircle tends to zero as R tends 
to infinity. The integral 


round the small semicircle is effectively 
112 


— 


PROOF OF RESULTS OF CHAPTER 9 113 


the integral of dz/iz round that semicircle clockwise and is —л. 


Thus 
o i 
f е ШЕ с 
— i 


whence the result (10.1) for E > 0 follows. If § <0 we consider 
the integral with the sign of 2 changed. 


10.2 Consider now a normal population of variates а, у with 
correlation p'. Its equation will be 


@® =. o бр {- ___ (ei — рау + удјаг ду 
2 VI — p^) 201 — р") 
(10.8) 


We lose no generality by taking the variates measured from zero 
means with unit variances. If we take a pair of values of а, say 
ал and а, we may allot a score based on ui — Ta and for the cal- 
culation of т can take this score to be sgn (a, — Ta) or some con- 
venient positive numerical multiple of 2; — Ve The distribution 
of a pair of independent values 2, and 2» Yı and y; i$ 


се; [= 1 а-ай 2 (ei + Ys) БИ + 4 
2(1 — p?) 


x dx, dos Фул dys (10.4) 
Put 
1 
U= E — аз)» из = va + аз) 
1 И 
0; = vat — уз), UE vat: + y) 
The distribution then becomes 
ағ Lo Ча fy = pus 4 је dv 
ос exp [ "me зу p t11 1) d d 
(10.5) 


Х ехр [- ROC. (ud — 2p'ugoa + of) Jaw, до; 

2(1 — p^?) 4 
Consequently u, and 01 are also dis 
correlation p' independently of e» and vs. 
we have 


tributed normally with 
Dropping the suffixes 


1 то + udo . (10.6 
АҒ ос exp E = 4 — 2р'и® + ot} fa do (10.6) 
1 
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10.3 Now if t is a sample value of т, E(t) is the expectation 
of the sum of zun — 1) terms each of which may be written 
sgn u sgn u. Thus 

E(t) = E(sgn и sgn v) 
= ү | sgn u sgn v dF 


م 


which in consequence of (10.1) becomes 


ame zen 4 J ачлы, qp . (10.7) 
ла Je i J- it, J- 


ص 


The expression in curly brackets is the characteristic function 
of F and is equal to * 


exp —  +- 2р1, + 8) . . . (10.8) 


Hence ама 
E(t) = 1 | | Ta dhe exp — в + 2% e + 8) 
7 201 dis 
Thus 
. | di, di, exp — J + әры, + id) 
др” ла) —% 
FF (10.9) 
7)1 — р") 


Hence, by а simple inte 


gration for p’, remembering that Е(4) vanishes 
when p’ = 0, we have 


E(t) = 2 in- p ^ . · (10.10) 
л 
which is the result given in (9.1). 


10.4 To find the variance of t in all possible s 
E(t?) = Е(Х sgn и sgn v)? 


where summation extends over all 


amples we require 


(3) values of u and v. We 
may write 


(Z sgn u sgn v)? = X sgn Ui; SEN Ug, SEN v5 SEN vj 
and there are three types of case: 
(i) If i =k, j =1 the term is + 1 and the expectation of cach 
term is +1. 


* Kendall, Advanced Theory, Vol. I, Example 315, р. 79. 
is easily evaluated directly. 


The integral 
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(ii) Ifi А, j Zl, the expectation of the term reduces to 
E(sgn uj; SER Viz) E(sgn из Sgn vr); 
= (5 sin} 2) 4013) 
л 


from (10.10). 
(iii) If i =k or j =l but not both, we have a type which may 


be evaluated by considering the case 1 = Ё =1, J = о kimie: 
Writing a single integral sign for convenience, let 
М = E(sgn из SEN 012 500 Ила SEN 213) 
= | dt, dts dts di, | fén de, dea dy dy йуз йз + (10-12) 
m^ J dt, ita its ita 

where we now write (dropping 4/9) 1112 
О = (m, — а) + (уа — Уа + (a, — ај + (ya — Yala 

= @( + ts) — eli — Wals + yi(ts + ts) — уза — уза (10.18) 
For the integration of f gi? over the values of а and y we may use 
the known properties of characteristic functions or integrate directly 
to find 


= а, — то ete. and thus 


T = free diy... ду. - <= exp [— Уб + 5) + (tg + А) 
+ f +8 + B+ 11 + 2p (61 + 3) + 4% + ће + ЕЛ! 


and thus 
ӘТ (201 + 24“. + bil. + %%% - (004) 
др' 
If we differentiate М of ) 
from (10.14) we have an expression 
9M dt, dt 
= 2 fa vat, 4 dh mp + 2 4% dt % d. 
it, ita лї ity 05 
1 ats dap LT [a HR 
£L aas 2% 2 4 f/ үн 
and ta this may be 


(10.12) with respect to р' and substitute 


Because of the symmetry of 7 in 11 and ty, із 
reduced to 
aM _ 5 (a at dis d'a p + з Гаа Т ‚ (10.15) 
Ел aac 2 ity tts ла ita its 
e first part with respect to 


We now carry out the integration in th 


tı and 1,, obtaining 


9 B + 2p'tsts + 8) 
[а aT = Еті 


exp — Қ 
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The remaining part of the integration can now be completed in the 
manner which led to the evaluation of (10.7). 


Similarly the second integral in (10.15) may be evaluated. We 
arrive at 


Е E 


sin-1 3p' 
07 mvae s Và sy) 
When p = 1, M = 1, and by a further integration we find 


2 
M = sin 2 $ Е sin-1 yw) HL 0. полу 
л л 9 
Тһеге аге | cases of type (i), (9 | m *) cases of type (ii), 


and 6 Ө cases of type (iii). Thus 
но Corll) QC) e) 
+6 (% + (ё sin! 2) - 6 пар) | ‚ (10.18) 


Subtracting the square of H(t), we find, after a little rearrangement, 


var і = ol — [s SET 2) 
+ 2(n — ЈЕ = (5 sin-1 ӘЙ 22-2 (10.19) 


which is the result given іп 9 4. The simi i ү 
.%. 5 res " 5 9 
р and g was derived i ое ae 


. (10.16) 


for p'. in that section, leading to (9.7) with “ written 
15 We may approximate to this formula as follows : 


sin! ip zm 


lsin-i, 
3 sin р = В 


then 2 sin о. = sin 38 
3 sin 8(1 — $ sin? f) 
But | ۱۴| < л 
6 
апа ћепсе 


m4 cr 
1 — 3 sine f > 2 
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e Jo] > |8] 
апа ћепсе 
2 2 
0<5- Е sin“ 2) « 11 = (5 sin 2 | < fpg . (10.20) 
9 л 9 л 9 


Where p, 4 are defined in (9.5). Furthermore, 


2 2 5 
1 + (т— 2) < ,n >10 . (10.21) 
{ ra ) ^ 9(n — 1) 7 ( 


n(n — 1) 9 
Using these results in (10.19) we find 
2 
vart < . 
9(n — 1) 2722 E1097) 
ost 
9(n — 1) 


as given in (9.10) and (9.11). 


10.6 We turn now to the parallel set of results for т estimated 
from the value of Spearman’s p for the sample which we write as 7. 
Suppose а, у are distributed in the bivariate normal form 


= : з — 20% 2 10.28) 
ех (a? — зрзу + 4| 6 

2л/(1-р”) : [ 20 — p'?) 

We define the grades, measured about their mean $, by 


ШЕШ 


E 


ayes ts Еге EA elm а 
(2a) Jo 
$e |) ГЕ f dy de 
= и ЛЕ dy . (10.25) 
V2) Jo 
Thus en | É findedy ~~ . (10.26) 
Now 52 
logf = — l (a? — 2p'ay + y?) = Мор (1 — p’*) — constant 
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Thus E 
1 af ВЕЕ RE рл) 0и ВЕС 
VFC 
е 
` f деду 
and hence 
D 0 2 
22454 120 | En oF dy dy. . (10.27) 
dp’ — Ј – даду 


By а partial integration with respect to v this is reduced to 


л ar” -»[ ay |“ da 2EM) of 
За i dy [^ zl. тара ај 
The first term vanishes and hence 
4% 12 | inm 
dp' -2 JC du ду 
Ву a partial integration with respect to y we find 
* до 
4% = та | Шола” 
ар =o J -o ду ao! 4 
whence from (10.24) and (10.25) 


ED 12 | Г : 1 — pg? bi 
dp dN, —узу)_„)_„°*Р |- A 250% »)م‎ 
+ pay + (2 — py e dy | 
6 


OR = p) 
giving, since Pg and р’ vanish together, 
S 
Ра = — sint d! 


P =2sin Т : Ы 2 . (10.28) 
as given in (9.21). 


10.7 Without going into great detail we will sketch Pearson's 
derivation of the variance of 


1 given іп (9.28). For convenience 
we drop the prime on p’ until the end, " 
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_ For large samples the variance of a product-moment correlation 
in samples from any population is 


varr = Es I 2 T 1 1 Mos Ha баз } 
mud, 2ноша 4 % 4 Ha Ий (10 
‚ (10.29) 


where the ws аге product-moments of the parent.* 
The variates in this case are the grades 5 and 7 as defined in 
(10.24) and (10.25), and the correlation is py. It may be shown 
that «gı = иза and we have 
Иго = Боз — 18 
Hao = Ноз = zo 
Hence (10.29) becomes (r being the sample value of рл) 
varr = Mena + de) — ару + Soi) 00:30) 
7˙ 0 H20 10 
By expanding the integral giving Иза in powers of p we find 


1 


ЛЕГЕ 1 + 0:607,9271p* + 0-140,7239p! + 0:086,7758р% 


+ 0:010,2587p® + 0:002,0933p!? + + ° + . (10,81) 
We also find 
Из 54 
740 = (0'889,8860р — 0-005,4820p? — 0-003,6798р? 
— 0001, 1836 + +++} . (10.32) 


We next express p, and p; in terms of p by the known expansions 


for (sim =p) and (sin- 1 р). We find 
1 + 3% = 1 + 0-455,9459p* + 0-037,0954p* | 
“| 0-005,0601p* + 0"000,8142р* +... 
2p, = 1:909,8593р + 0:079,5775р? + 0:009,2650р" 
+ 0001,8892р7 +..." (10.88) 
Finally, substituting from (10.81), (10.32) and (10.33) in (10.30) 
We find 


Marr = а 4 0-161,8887р6 


(1 — 1-666,5507p? + 0-488,6180р 


Sle 


+ 0-049,5042p8 + . · +} 


= dp 1 + 0-333,4493p? + 0:100,2116р" 
т 


+ 0029, 4076 + 0:007,8078р + ° - 5h . (10.84) 


Vol. I, p. 211. 


* See Kendall, Advanced "Theory, 
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But 


52% 
= 2 sin 
Р e 
and thus for 7' estimated from т = 2 sin 2” 


2 7 
var т” = s = 29 varr. 
9 4 


Thus we find from (10.34) 


varr = 


1— { 
z 5 {1 + 0.083, 249352 + 0:017,1408р4 | 


+ 0:004,2797р% + 0-000,4559р® } 


H H j 
Taking the square root, restoring primes and writing 7” for p' on 
the right as customary for large samples, we find 
„ _ 1'047(1 — 7'2 ^ 
M varr' = = {1 + 0:0427 2 + 0-0084 + 0:002" } | 
y 


. (10.85) 
as given in (9.24). 
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CHAPTER 11 
PAIRED COMPARISONS 


11.1 Up to this stage we have considered rankings as given by 
the cireumstances of the problem, and have not concerned ourselves 
with the question whether the data do properly lend themselves to 
а ranking treatment. Cases often arise, particularly in psychological 
Work, where there is some doubt on this point. Suppose we ask an 
Observer to rank т men in order of intelligence. He may attempt 
to do so, and may even succeed in producing а ranking, although the 
nature of “ intelligence ” is so obscure that we cannot assume the 
possibility of ordering individuals by reference to it. То some extent 
We are begging the question by assuming that intelligence is a linear 
variable, Again, we may ask an observer to rank a number of 
districts according to his preference for living in them; but his 
preferences will depend on a number of factors such as cost, avail- 
ability of transport, height above sea level, or nearness to shopping 
centres, and it by no means follows that he is capable of expressing 
a final preference on a linear scale. ТЕ we insist on his carrying out 
à ranking, and even if he complies under the impression that he is 


doing something within his powers, We may be forcing the data, so 
to speak, into an over-narrow framework which will distort the true 
this section is designed 


5 The method we shall discuss in 
o overcome such difficulties. 


11.2 We shall suppose that of n objects each of the possible 
цп — 1) pairs is presented to an observer, one pair at a time, and 
ber of the pair. If A is 


that he records his preference for one mem. 
Preferred to B we may write 4 > B or B «— A. 


Example 11.1 
six different kinds of food were 


In some experiments on à dog, 
as offered to thes 


Prepared. Each of the M ==15 possible pairs w 


dog and a note was made of which member he took first. (These 


data are for illustrative purposes and were not à serious attempt to 
121 
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investigate canine preferences for food.) Denoting the six foods by 
the letters А to F we may record the results as follows: 


TABLE 11.1 

PREFERENCES OF А Doc ron Six Foops 

A B с D Е Е 
А — 1 1 0 1 1 
B 0 Em о | "а ЛЕ о 
с 0 1 zi TEE ае 5 
D 1 о о = o- | 0 Е 
Е о о 0 F 
F 0 1 0 ГЕЙ Onell. 


For instance an entry 1 in column Y and row X means X — Y and, 
of course, corresponds to 0 in row Y and column X. Thus, in the 
above table, A> B, 4— С, A«— D, ete. Тһе diagonals are 
blocked out. 3 


^ The arrangement of the objects in rows and columns is arbitrary, 
ut it is clearly convenient. to have the orders in row and column the 
same. ' 


The complex of preferences may 


A 


also be represented diagram- 


B 
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matically. We represent the six objects 4 to F by the vertices of 
a regular polygon as dn Fig. 11.1. The vertices are joined in all 
possible ways by straight lines andif.X — У we draw an arrow on the 
join XY pointing from X to Y. 


11.3 ТЕ an observer expresses preferences for three objects 
X, F, Z as X YZ X or X YZ X we shall say 
that the triad is “ circular ” or “ inconsistent’. In the triangle 
XYZ all the arrows in the diagram of type 11.1 go round the same 
way. In Fig. 11.1 the triads ACD, BEF and three others are circular. 

Clearly circular triads cannot arise in ordinary ranking for if 
X — Y and Y —Z then Х->2. It is then а necessary and 
sufficient condition for the possibility of expressing the preferences 
ав à ranking that no circular triads be present. The more circular 
triads there are, so to speak, the further we depart from the ranking 
Situation towards a position of inconsistency under which .X may be 
Preferred to Y and Y to Z but nevertheless Z is preferred to X. 


11.4 It is possible to have circular polyads of extent greater 
than three. For instance if 4 — B — C> D— А the tetrad 
ABCD is circular. A circular n-ad must, however, contain n — 2 
circular triads but it may contain more ; and the fact that it contains 
Circular triads does not imply that it is itself circular. Suppose, for 
instance, that ABCD is circular. Then either 4 — C or C — A. 
In the first case ACD is circular, in the second АВС. Similarly 
either ABD or BCD is circular. Thus the tetrad must contain at 
least two circular triads. On the other hand, the scheme expressed 
by 4 B CDA, BD. C A contains the circular 
triads ABC and ABD, but ABCD is not circular. 

We shall therefore concentrate on circular triads which compose 
the elementary inconsistencies of the situation, and shall ignore the 
More ambiguous criteria based on polyads of greater extent. 


11.5 It will be shown in the next chapter that if n is odd the 
maximum number of circular triads is 3(7 n), and if n is even the 
maximum number is (n? — 4n). The minimum number is zero. 

24 


€ may therefore define a coefficient of consistence by the equations 


n odd | Р 


арага” (11.1) 


244 „ "db суеп) 


ал = == 
з — 4n 
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where d is the observed number of circular triads. If and only if ¢ is 
unity there are no circular triads, and the data may be ranked. 

For example, in the data of Example 11.1 there are 5 circular 
triads. The maximum number is 8, so С = 0-375. 


11.6 We may, in a certain sense, test the significance of a value 
of С by considering the distribution it would have if all the preferences 
were allotted at random. This will tell us whether the observed ¢ 
could have arisen by chance if the observer was completely incom- 
petent, or, alternatively, whether there is some degree of consistence 
in his preferences notwithstanding a lack of perfection. 

Appendix Table 9 gives the probabilities that certain values of 4 
will be attained or exceeded for n = 2 to 7, on the assumption that 
preferences are allotted purely at random so that any preference 
scheme is as probable as any other. These distributions are rather 
troublesome to obtain, and in practice are not often required for 
higher values of n; but when a test is required it may be derived 


from the 4?-distribution, to which that of d tends as n increases. 
In fact, writing 


у _ Mn — 1)(n — 2) 
BID 


OTERA 


is distributed approximately in the usual 
freedom, The distribution is, however, 
! ower values of d, so that the probability 
that d will be attained or exceeded is the complement of the prob- 
ability for уз. The following example illustrates the point. 


(11.2) 


Example 11.2 


ATE a Set of 7 a value of d €qual to 13 is observed, From (11.2) we 
e 
Six Ө ОБ 
PE 55 25:88 
* = 3( 


8:75 — 18 + 4) + 28-33 — 18-88. 
From Appendix Table 8 We see that these 
mately to a significance level of 


values correspond approxi- 
significance level for d is thus 1 — 9. 


about 0:95. The appropriate 
95 = 0:05. The exact value of 


/ 
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the probability, from Appendix Table 9, is 0-036. The approxi- 
mation is fair for such а low value of n as 7. 


117 Ina table of the type of 11.1 it is possible to ascertain the 
number of circular triads d without counting them directly. Suppose 


the row totals are a, . . . аз. Then 
n 


d = іп — 1)(n — 2) — pD ala; =) . (11.8) 


ї=1 


For example, in Table 11.1 the row totals are 4, 2, 4, 1, 2, 2, totalling 


6 
15 = М 
(2) Thus 


АХаа-і)-% Ха" 


І 
шеш 
pnl 

— 

© 

| 

= | 
Bp 

|| 

— 

Gt 


giving 

d = 20 — 15 = 5. 
The same formula applies to column totals, say ba . Un 9-с 
have, in virtue of the method of construction of the tables, 


b, = (n — 1) — 47 


Thus 
Showa 
хы = n(n — 1)? — 2(n — 1) Za + Ха? 
= Xa 


and hence 
Xb(b —1) = Zala — 1). 
We сап also write (11.8) in the form 
d = nn — 1) (2n — 3) — 4 ЖЫ < e А) 


Which is probably the simplest for calculation. 


Coefficient of agreement 
11.8 Suppose now that we have m observers each of which 


provides 5 preferences between pairs of n objects. Suppose that 


in a table of form 11.1 we enter a unit in the cell in row Х and column 
Y whenever X — Y and count the units in each cell. A cell may, 
then contain any number from 0 to m. ТЕ the observers are in 


complete here will be *) cells containing m, the remain- 
р. agreement tl 2 
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ing cells being zero. The agreement may be complete even if there 
are inconsistencies present. 


Suppose that the number in row X and column Y contains the 
number y. Let 


5-2) Io ˖ "5 ЕЕ 


the summation extending over the n(n — 1) cells of the table ee 
diagonal cells being ignored). Then X is the sum of the number о 
agreements between pairs of judges. Put 


22 


87 
m(m — 1) n(x — 1) 


1 ; . (11.6) 


< We shall call u the coefficient of agreement. 
agreement, and only in this case, u = 1. 
this case, as measured by agreements bet 
smaller u becomes. The minimum numb 


is dm if m is even or Қт + 1) if m is odd 
of u is —1/(m — 1) 
This minimum v 


If there is complete 
The further we depart from 
ween pairs of observers, the 
er of agreements in each cell 


. In the first case the value 
in the second — 1/m. 


alue is not — 1 unless m = 2. In such a case, 


with two Observers, we have 
и = A COMM —1 . . . (11.7) 
n(n — 1) 
so that 4 may be regarded as a generalisation of the coefficient т in 
this case. 


Example 11.3 


written the possible pairs of subjects and 
asked to underline the one Тће results were 
as follows: 


21 boys, 18 school subjects. 


The preferences are shown in 
Table 11.2. 
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TABLE 11.2 


PREFERENCES OF 21 Boys IN 13 SUBJECTS 
X 2 3 4 5 6 4 8 9 10 11 12 18 TOTALS 


B dm — 14 20 15 15 16 16 18 18 18 20 21 20) 211 
we т — 14 12 18 18 14 16 16-20 16 18 19) 188 
4. Scienc 1 7 — 10 14 10 16:18 16 1017 16/10) NEGO 
5. History і о 1 I 12 15 14 15 (48 1 MG ан 
Gh Geograol 6 4 X 10 — 14 11 12 % та пало ИКО 
MEAS арр $ 831 9 т — 14 14 18 18 16 15 7 157 
eigen t OT e lo т 2 9 DISMISS NIS 
O Enns Ti 3 5 3 т o 7 2 12 14 14 16 18116 
10, 8 Literature 3 5 5 8 T 3 10 d 106 
ir Соттегеіаї subjects | 3 1 5 8 6 8 8 7 11 — 10 10 14) 51 
12 pee 1 Ii i4 805-677 аот 
18. English Grammar | 0 3 5 4 7 6 8 к ОВ tes iuo m 
ш оешу 08 2 or os % PF 

ToraL | 1,688 


5 The calculation of E for this table, in which the objects are 
anged in order of total number of preferences, may be shortened 
ӘУ noting that X, as given by equation (11.5), may be transformed 
into the form 
mn 
2 20% 20 (S) 
the half of the table below 


whe A 
те the summation now takes place over 
alf are smaller than those 


the di 3 Б А 
the diagonal. Since the numbers in this h 


1 . H H 

n the other half there is a considerable saving in arithmetic. 
We find X = 9718 

and hence 2 * 9718 _ 4 _ 0-186. à 


3 
21/13 
2/\2 

There is thus a certain amount of agreement amo 


indies 
Bot by the positive value of u. 
е distribution of circular triads was 25 


ng the children, 


follows : 


No. of Triads Frequency No. of Triads Frequency 


——— ом 


S но 
нн юю шн мн 


ы 
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The total number of circular triads was 242 with a mean of = 
Only one boy was entirely consistent. On the other hand, for үк ~ 
the maximum number of circular triads is 91, with a mean valu d 
71:5. It is thus clear that, except perhaps for one boy, we сап s 
suppose that any boy allotted preferences at random. We are ай 
led to conclude that the boys are genuinely capable of making 
distinctions, and that consistently, on the whole, half the boys have 
coefficients © greater than 0-92. 


Example 11.4 


It frequently happens in practice that observers decline to express 


a preference between some pairs of objects. We then arrive а 


difficulties similar to those arising from tied ranks. We shall deal 
with them in the table of type 11.1 by putting } in each of the cells 
row X, column Y and row Y, column X where no preference 15 
expressed between X and Y. The following will illustrate the 
method. I am indebted for the data to Mr. J. W. Whitfield of the 
Psychological Laboratory at Cambridge. А 

46 workers їп a department were asked to say which of a pair they 
considered more important in the 66 possible pairs from the following 


items : 
Ventilation Good opportunities for promotion 
Canteen facilities Lavatories and Cloakrooms 
Responsibility 


Work which requires no thought 
Pension fund Lighting 


Interesting work * Hours of work 
Security of employment (ie. of work in general) 
Tenure of employment (i.e. at this particular factory) 
The following table shows the results : 
ACER р 


Op L&C WNT Li IW HW SE ТЕ Тотмз 
ҚТЫ Гао OC 2047167 | 8. ^29 24 28} 27 | 1021 

K е E ср вор 2 o 35 25 30^ 24 22 298 

Ri 86 214 — 21 49 23 0 35) 36 32 37 28 | 320 
cp. STIS 32 22 90 32 291 294} 
BER Нан 0) WI ел 82 28} 26 25} 23 | 216 
L&C} 30 21 13 20 25 == 0 18 924 22 24 301 224 
ИМТ | 48 40 46 5% 45 gg 20 46 46 44 46 444| 490i 
Di КВ ИЛ лова Pin ови а 5 20 25 27} 20 | 207 
DU uou J8 Ло T4 try 5800 502 14 33 23 | 199 
Ri] DX NR avo ПР ES 92 — 32 18}| 218} 
ЗЕ IN 19^. 9 14 207 22 9. Ер 38. ја. 20 107 
ШЕСІ” isk јв 36 798. isi IF 23 27k 19} — | 208 
TOTALS | 313} 208 186 211} 290 282 9} 299 307 2871 339 303 | 3,036 


tí А 
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We find from the sums of squares of items in the table 
2 у? = 86,392, Ly = 8086. 
Thus 
1 2 y(y - 1) = 41,678. 
Hence, if Фе use formula (11.6) without regard to fractional 75, 
_2x 41,678 1 
46\ [12 
2 2 
= 0-220. 
Consider now a score such as that in row С column R, 243, with the 


complementary score in column С row Ё of 213. As we have just 
calculated the score, the contributions of these two are 


24. 21} 1/25 24 1/22 21 1 
(2) 6 ) + GQ) OOE 
Thus our crude method is equivalent to taking an average of the 
undetermined preference, first by assuming R — C so that the scores . 
аге 25 and 21, secondly by assuming R «— С so that the scores are 24 
and 22; except for the factor + which is negligible. This is in ассога- 
ance with our treatment of ties in the ranking case. 

In the above table the score shown for row R column P, 21, was 
in fact 202, i.e. comprised two halves, and similarly that in column № 
155 Р was 242. If we were to average the possible scores we should 

ave 


. 90, 26) + (21, 25) + #22, 24) 


and the difference from our actual count of (21, 25) is thus 


1/20 26 22 24 EN DPA Ек 
2000 (0 + (2) + (2) 20 242 2 
Again the difference is negligible. 


11.9 А test of significance of u can be obtained by considering 
what the distribution would be if all the preferences were allotted at 
random. These distributions have been worked out for values of 
т = 8, n=2 to 8; m =4, n = 2 to 6; т =5, n=2 to 5; 
т = 6, n = 9 to 4, and form the basis of Appendix Tables 10. 

K 


af 
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For higher values of т and т a sufficient approximation is given 
by the z?-distribution. We write 


Eine 54 у _ ут т\т — З . . (11.8) 
m — 2 2\2/\2/m — 2 


EM 0 m(m — 1) , Р . (11.9) 
ла (т — 2)2 ` 1 Е 


and test in the z?-distribution with v degrees of freedom, A con- 


tinuity correction may be applied by deducting unity from 5. 
For example, with m = 3, n = 8, we have 


д = 45, » = 168 
From Appendix Table 104 we have, exactly, 
for X = 54, P = 0:011 
Топ Х--- 58, P = 0-0011 


For these values of X (with со 
values of у? are 212 and 998. 
distributed normally with uni 
so that our deviates are 


ntinuity corrections) the corresponding 
For > = 168 we can take (273) to be 
t variance about „(ду — 1) = 18:30, 


m Mr 
V(2 x 212) — 18-30 = 2-29 


and 


RR V 
V(2 x 228) — 18:80 — 8-05. 

correspond to probabilities 
ct values. 


d from (11.8) and (11.9) that 


ppendix Table 3 to 


0-011 and 0-00114, very close to the сха 


Similarly, with m = 6, n = 4 we ћу 


2X — 93775 is distributed with 11:25 degrees of freedom. From 
Appendix Table 10D we see that the 1 Per cent point lies between 
2-59 and У — 60; © corresponding уз values are then (with 
continuity corrections) 24-25 and 25-25. From Appendix Table 8 


we see that these values do in fact fall very close to the 1 per cent 
point for у = 11-25, which is Somewhere p 
and 26-217 (> = 12). 


Example 11.5 


In Example 11.8, for n = 18, m = 91 278 8 > 
u = 0186. This indicates some comm ound X = 9718 and 


+ Ы ity of prefer -not 
y П 5 е 1 д ег 10 
a ver, large amount. Is th Value Significant у 5 


etween 24.725 (у = 11) 


مم 
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From (11.8) and (11.9) we find 


ә 
2 2 (отв = 1 = 491-4 
21 2 1912 2 


2 2 
Te " X 20 90.7 
2 192 

V(242) — V(2» — 1) = 158. 
а This is far beyond any ordinary significance point, and we conclude 
that the observed и could not have arisen by chance from a population 
in which all the boys allotted preferences at random. This confirms 
our conclusion reached in Example 11.3. 

Again, in Example 11.4 we find 
2 = 7545, у = 706 

V (242) — V(2» — 1) = 27-02, 
ult if the preference were allotted 
hat the observed value of w is 


~ 


2 


again an extremely improbable res 
at random. We may conclude t 
Significant. 


References 


See Kendall and Babington Smith (1940) and Moran (1947). 


СНАРТЕВ 12 
PROOF OF THE RESULTS OF CHAPTER 11 


12.1 We will first establish the result that in 
comparisons the maximum number of circular triads is „(5 — n) 
for n odd and эң(®% — 4n) for n even. А 

Consider a polygon of the type of Fig. 11.1 with n vertices. 
There will be 9-1 lines emanating from each vertex. Let 
41 . . . а, be the number of arrows which leave the vertices. Then 


S (% * а co 


тті 
and the mean value of а is i( 


2 complex of paired 


^ — 1). Consider the function 


t => а — I(n — ђе. . . (122) 
і-і 


that is to Say, n times the variance of the a-numbers. We have at 


опсе 
T= Zaj — n(n — 1? . А . (12.8) 


12.2 Wenow show th a preference is altered 
and the effect is to increase the number of circular triads by p, 715 
reduced by 2p and vice versa. Consider the preference A — В. 
The only triads affected by reversing to B Д are those containing 
the line AB. Suppose there are % preferences of type 4— X 
(including 4 —- B) and В of type B x. Then there are four 
possible types of triads : 


at if the direction of 


4->Х--В, say v in number 
4--Х->В, 

А-Х» В, which must number g — а--1 
А «— X «— В, ээ » » В— а 


When the preference 4 — В is Teversed the first 
- circular. The third becomes circular, the fourth 
The increase in the number of circular triads is 


(«—2—1)—(8—2) =а g 
132 


two remain non- 
ceases to be so. 


= 7 = р. 
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The reduction in T is 
a? — (a — 1)? + 82 — (B +1)? = 2) — 5 — 1) = 2р. 


Our result follows; for the effect of altering individual preferences 
is cumulative on Т and а. 


12.3 From the definition of T it is clear that the maximum value 
is given when the data are ranked, and thus max T = (тв — n). 
For the minimum value consider a polygon with vertices Ar Edi 
Set up the preferences 4, > 4, —> · + + — 4, Next set up the 
preferences А, > А, — А; —>. If this does not provide a closed 
tour of all the points of the polygon proceed to the next unvisited 
vertex, Au, and set up the preferences Ay > Ауљу — ete. and so on. 
Then set up the preferences 4, — А, —> Ay cte. and so оп until 
the whole preference schema is completed. 

ТЕ n is odd the preferences described will consist of circular tours 
of the polygon, and each а will be X(n — 1) so that T = 0; and this 
is obviously a minimum. Ifnis even the last preference 4; > Ayn 41 
will not be a tour but will consist of a single line joining one vertex 
with the symmetrically opposite vertex. Thus there will be ја 
vertices with а = 4n and zu with a = jn — 1. In this case T= 
and again this is a minimum. J n 

Thus Т can range from 0 or $n to (n? — n) and since an increase 
of two in T corresponds to a decrease of unity in d, there are the 
following maximum values of d: 

a(n? — m) n odd 


Jun? = 4n) т even. 


are the totals of rows in the table of 


12.4 The numbers а tals 
as been said it follows that the number 


type 11.1, and from what h 
of circular triads is given by 

а = (m — n) — Т) 
is = п) + т — 1)? — Laz} 
n(n — 1)(2n — 1) — аха? 


ay 
ша — 1)(® — 2) — 3 2 ада; — 1) 
11.8) апа (11.4). 


а5 given in ( 


12.5 We now establish the y*-approximation to the distribution 


of d as n becomes large. 
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Let the objects be numbered from 1 to n. Write Рур = if the 


triad (i, j, Ё) is circular and Pj, = il it is not. Then 
d—ZXPga · 5 5 . (12.4) 
the summation taking place over all triads. Thus— 
E) = (вор, 
By enumerating the possible cases for 


preferences in a given triad we 
see that E( P) =}. Hence, for the mean value of d, 


= 1/2 . (12.5) 
E(d) Қо) Е : 5 ( 

Now consider E(X Рив)“. When we expand this there will be 
(5) terms of type Р ks a(5)(" с terms of type Pj, Pim 


where j #1, km; (S) — 8) terms of type PH Р, where 


— 8 4 у 
kl; and Ge 3 ) terms of type Р Р With different suilixes. 


By examining particular configurations we see that the expectation 
of the first is + and of each of the others is 2. Thus 


E Pi,) = 100% n т m v i. 


oe e 
E OOED 
Hence 


ша = Bd а): — wo) 8 7 


he third and four 
е same in ргіпсір 


1 т 
= —_|- 97273 ° 
БА жеу рол 


(12.6) 
The calculation of t 


5 х th то i more 
complicated, but is th ments is much 


le. We find 
(12.7) 


~ 86,936 + 80,352} 


(12.8) 
12.6 The moments of the £ distribution, 
dF œ етіс ті dy, 
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are given by 
и, (about zero) = ? 
и» = 27 
n = 8» 

We note that иза) is negative whereas и 17) is positive. We 
shall therefore measure @ from the other end of its distribution so as 
to bring the distributions into accord. Applying a correction for 
continuity, we see that z 


в= 4%) — d + 5 + 7 4 . (12:9) 
443. 


has mean value v, the mean of 27. We will choose & so that the 
distributions have the same variance, which is obviously so when 


k= Hach = 200 2.10) 


For the third moments to be equal we must have 


: E MT — 4) 
3213 8? 


GG} mare 
1643 
leading to 
_ nn — 1)(n — 2) 12 11 
deser ESO 5 3 . (12.11) 


Using this value of » in (12.10) we find that 


41400 “Су. i n 2277 11219) 
m — &\4\8 


has the first three moments the same аз those of . 


12.7 It may be shown that the distribution of d tends to 
normality as n tends to infinity. The proof—for details of which sce - 
Moran (1947)—follows the lines of those given in Chapter 5. We 
show that moments of odd order are of lower order in 2 than those of 
even order. In an expansion 

(X Qin)?” 
where Qijx = Pur = 2, the dominant term is of type 62,02,..2 
(2m)! 
27m!" 


occurring in m factors with expectation 3% and frequency 


186 RANK CORRELATION METHODS 


Thus the dominant term gives 


2m)!/ 3 Y" 
Ham. mal ) (2) nim 


2™m!\32 


2m)! 
7 (туі, 2)™ 


and the tendency to normality follows. 


12.8 Finally, we deriv 
coefficient of agreement, 
random. 

The contribution to Х from two 
(row Y, column 2 ) is typified by 


(2 ч (Т 3 А 2222 2 (1918) 


Of the 2m tota] ways in which the m preferences can be allotted to 


е the 7*-approximation for testing the 
» ù, when all preferences are allotted at 


cells (row X, column У) and 


the cells there will be (^? in which y units occur. Consequently 


the frequency of the contribution to X is the corresponding coefficient 
of t in the array 


т т—1 т- 
fat + (5) 27) 0% +( 27)+(8) Ж 
Now if the preferences are 
for the (3) 


ЖҰНПТАТ) 


allotted at random the contributions to 2 
cells are independent : 


> and hence the distribution of X 
is given by 


(3) 
FAM 3 Г 4 ‚ (1215) 
the frequency of X being the Coefficient of ух 
For instance, with fy 8) << 4; 
fH! + а 4 8 4 4 
and the distribution is arrayed by 
(243 + 13) је = 2*5 (729 4 145872 + 121548 


in this array. 


= 268 + 42) 


+ 54015 + 13578 


+ 18/10 + p?) 
with a total frequency of 2° x 40. These апа Similar values form the 
basis of Appendix Tables 10. 


12.9 For constant m the distribution of уз 


tends to normality 
with increasing n, for it is the mean of (7 


Constituents with finite 
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е mo: s 
qual moments.* For constant n the distribution tends with increas- 


ing m to a form of the distribution; for each of the (3) cells 
contributes a variate (3) + P У У) which is effectively like y”, 
and the distribution of y tends to normality. 

The rth moment of X about the origin is given by 


mu = [659] . ы „ ee 


The differentiation, in fact, multiplies a term containing i by 27, 
and when we put t = 1 the array becomes the sum of frequencies 
each multiplied by X^ which provides the rth moment. Thus, for 
the first moment, 


m 


2" p = || Ж (5) 6 — тј + Hm? — т)) 
ЕСЕГЕ) 
na -P Е ао) 


algebra—we find 


giving 


In a similar way—we omit the 


тј ту [8m? — 15т + 17 з (nV. 
Ha = CX $ + 322 m(m — 1) 


Proceeding in the manner of 12.6 we sce that 


4 T 1/n\/m\m — 3 
2- Х-- — 3 
x т- А (S == 5) . . (12.21) 


"his is a particular case of the Central Limi 
Aaa cod Theory, Vol. L. 7.32. imit Dheorem-—see Kendall 
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is distributed in the usual form with 
ques (m —1) 
2/ (m — 2)? 


* 


degrees of freedom. 


References 


See references to the previous chapter. 
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APPENDIX TABLE 1 


PROBABILITY THAT S (FOR т) ATTAINS OR EXCEEDS А SPECIFIED VALUE. 
(SHOWN ONLY FoR PosrrivE VALUES. NEGATIVE VALUES OBTAINABLE 
ву SYMMETRY) 


| 
Values of n | Values of n 
5 S» 
а | ë 8 9 ene 7 10 
M . ас || И ни "» 

0 0-625 | 0:592 0-540 1 0-500 | 0-500 | 0-500 
2 0-375 | 0-408 0-460 3 0-360 | 0:386 | 0-431 
4 0-167 0:242 0-381 Б» 0:285 0:281 0-364 
6 0:042 0-117 0-306 7 | 0-186 0-191 0-300 
8 | 0-042 0-238 9 0-068 0-119 0-242 
10 | 0-0°83 0-179 11 0-028 0-068 0-190 
12 | 0-130 13 0:083 | 0:035 | 0-146 
14 | 0-090 15 0.0214 | 0015 | 0-108 
16 | 0-060 17 0-0254 | 0-078 
18 0-038 19 0:0214 | 0-054 
20 0-071 | 0-022 21 0-0320 | 0-036 
22 0:0228 | 0-012 23 0-023 
24 0-0°87 | 0-0°63 25 0-014 

26 0-0319 | 0-029 27 0-0288 
28 0۰0425 | 00212 29 00°46 
30 0-0?43 31 0.023 
32 0۰.012 33 0:011 
34 0.0425 35 0.0347 
36 0:0528 87 | 0-0°18 
39 0:0%58 

41 0-0415 

48 0:0528 

45 0.028 


Nole.—Repeated zeros are indicated by powers, e.g. 0047 stands for 0۰00047. 
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APPENDIX TABLE 2 


PROBABILITY THAT S(d?) (ron p) WILL BE ATTAINED OR EXCEEDED FOR 


VALUES OF n FROM 4 


TO 8 INCLUSIVE 


S(d*) ; 
0 2 | 4 в 8 | 10 | 12 | 1 | 36 | 18 | 20 | | 2 | ов - 
ned = 0:958 | 0:833 | 0-792 | 0-625 | 0:542 | 0458 | 0-375 | 0-208 | 0-167 | 0-042 
т-5 1 | 0-992 | 0-958 | 0-933 | 0-883 | 0-825 | 0-775 | 0.242 0-658 | 0-608 | 0-525 | 0-475 | 0-392 | 0-342 0:258 | 
n=6 1 | 0-999 | 0-992 | 0-983 | 0-971 | 0-949 | 0-932 | 0-912 m 0-851 | 0-822 | 0-790 | 0-751 | 0-718 | 0-671 
Жетер үст 1:000 | 0-999 | 0-097 | 0-994 | 0-088 | 0-983 | 0-976 0-967 | 0-956 | 0-945 | 0-931 | 0-917 | 0-900 | 0-882 
n=8 1 | 1-000 | 1-000 | 0-999 | 0-999 | 0:998 0:996 | 0-995| 0-992 | 0-989 | 0-986 | 0-082 0-977 | 0-971 | 0-965 
30 m [РУ m w 42 4 46 | 48 | 50 | s2 | ë4 | co | 58 
|. | 
n=5 mes 0-175 | 0117 | 0-067 | 0-042 | 0.0582 
= А ---- 
n=6 m 0:599 | 0-540 | 0-500 | 0-460 | 0-401 EU | 9:282 | 0:249 | 0-210 | 0-178 | 0-149 0:121 | 0-088 
~ 0 
* [0807 | 0-849 | 0-823 | 0-802 | 0-778 | 0-751 | 0-799 0:703 | 0-69 | 0-643 | 0-609 | о-во | 0-547 | 0-518 0482 
n=8 | oss 0-952 | 0-943 | 0-934 | 0-924 | 0.915 | 0-902 0802 0-878 | 0-866 ЕСТІ 0-837 | 0-820 | 0-805 | 0-786 
60 | 62 | e& | ec | вв | vo | > 7$ | 76 | тв | во | аз | ва | вв | 88 
"-6 [0088 | 0-051 | 0-029 | 0-017 | 0.0383 6.0:14 1 
n= 7 04620420 | 0:391 | 0-357 | 0:331 | 0.207 9278 | 0249 | 0-222 0-198 | 0-177 [0.151 | 0122 0-118 | 0-100 
nes ШЕ 0:750 | 0-732 | 0-709 | 0-690 бз 0:048 | 0-624 | 0-603 | 0-580 | 0-559 | 0-533 | 0.512 0-488 | 0-467 
90 "m 94 90 : 98 | 100 | 102 | 104 | 106 108 | 110. ТЕ 114 | 16 | 118 
"=7 |02062 0009 | 0055 | 0914 | 0-033 E pod 0012 | 00:62) 0.0234! 0.0214! 0.0220! Y 
пв [ола | 0-420 | 0-397 0-876 | 0-252 | 0-332 | 0-310 | 0-29) 0:268 | 0-250 | 0-231 | 0-214 | 0-195 | 0-180 | 0-163 
120 | 122 | 194 SUE UTR КЕЙ ла 136 | 138 "m 142 Л 146 | 148 
^78 [0150 0-131 | 0-122 | 0108 бф | Goss 0.076 | одев 0057 | 0-048 | 0-042 | 0-035 | 0-029 | 0-023 | 0.018 
150 | 152 | 154 | 156 158 | 160 | 162] 164 | 166 | 18 
б 8-8 | 0-014 | 901 0075 | owt 0-0°36- ТЕ 0-0257 | 0.0320 ШЫЛ 
і | 


Note.—Repeated zeros are indicated by Powers, e.g. 0-0°20 stands for 0-00020. 


| 


APPENDIX TABLE 3 
AREAS UNDER THE NORMAL CURVE (PROBABILITY FUNCTION OF THE 
NORMAL DISTRIBUTION) 
i 
'The table shows the area of the curve y — A) M lying to the left 


of specified deviates 2; eg. the area corresponding to a deviate 1:86 
(= 1-5 + 0:36) is 0-9686. 


| 1 
Deviate Or + 0:5 + TO + 15 + 2:0 + 25 + 80 + 35 + 

0:00 5000 6915 9882 9772 92279 92865 9377 

0:01 5040 6950 9845 9778 97396 92869 9278 

0:02 5080 6985 9357 9783 97418 92874 9378. 

0:03 `| 5120 7019 9370 9788 9?430 92878 9379 

0:04 5160 7054 9382 9793 97446 9:882 9°80 

5199 7088 9394 798 92461 92886 9581 

5289 7123 9406 9803 92477 97880 9381 

5279 7157 9418 9808 97492 9?893 9382 

5819 7190 9429 9812 92506 92897 9383 

5359 7224 9441 9817 97520 97900 9388 

5398 7257 9452 9821 92534 9203 9384 

5438 7291 9463 9826 9?547 9*06 9°85 

| 5478 7324 9474 9830 9?560 9210 9385 
1 5517 7857 9484 9834 92573 9318 9386 
5557 7889 9495 9838 92585 9316 9386 

5596 7422 9505 9842 92598 9218 9387 

5686 7454 9515 9846 92609 9321 9387 

5675 7486 9525 9850 92621 9324. 9:88 

5714. 7517 9585 9854 92632 9326 9°88 

5753 7549 9545 9857 92643 9320 9389 

5798 7580 9554 9861 92653 9331 9389 

5832 7611 9564 9864 92664 9284 9390 

5871 7642 9578 9868 92674 9°36 9290 

5910 7673 9582 9871 92683 9°38 9:04 

5948 7704 9591 9875 92693 9340 9:08 

5987 7738 9599 9878 9?702 9342 9:12 

6026 7764 9608 9881 92711 9344 9:15 

6064 77194 9616 9884 92720 9°46 9418 

6103 7823 9625 9887 9:728 9248 9422 

| 6141 7852 9633 9890 9°736 9°50 9425 
6179 7881 9641 9893 92744 9352 9428 

6217 7910 9649 9896 9752 9558 9:81 

6255 7939 9656 9898 92760 9355 9:88 

6298 7967 9664 9901 92767 9257 9:36 

6331 7995 9671 9904 92774 9358 9439 

6368 8023 9678 9906 9°781 9°60 9441 

6406 8051 9686 9909 9°788 9°61 9:48 

6443 8078 9693 9911 9°795 9°62 9:46 

6480 8106 9699 9918 9°801 9°64 9:48 

6517 8138 9706 9916 92807 9265 9:50 

1 8554 8159 9713 9918 J 97813 9366 9:52 
6591 8186 9719 9920 92819 9268 954 

6628 8212 9726 9922 92825 9°69 9:56 

| 6664 8238 9782 9925 92831 9°70 9458 
6700 8264 9788 9927 92836 971 9459 

| 6786 8289 9744 | 9929 | 9:841 | 9°72 961 
6772 | 8815 9750 | 9931 | 97846 | 9373 968 

6808 8840 9756 9982 9°851 9574 9164 

6844 8365 9761 9934 9°856 9875 9166 

6879 8389 9319 9767 9936 92861 9576 9167 


Note. —Decimal points in the body of the table are omitted. Repeated 9's are 


indicated by powers, e.g. 9°71 stands for 0:99971. 
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APPENDIX 


THE DISTRIBUTION FUNCTION or y = 


оооооосооо 


ъа د‎ de сок 
56 ت ت ف خث ف فت ۾‎ 
2-26 بت خڅ ةة‎ S e ف ن ف ف ن رفع م غ ت‎ û 
зетеосљ=зофа 


برت ت اا ت وای و т жән‏ 

жыр зш ыш د‎ шз шз Bo BO LO IO tO tG tO RO ا‎ t 
А здру 

ج ر ج جا ج رج و ج ج وج 


SAAR SoH 


canann 


0:865 

"858 0-868 0-875 0-879 0-883 

0-791 0-838 0:85: В 5 22955 Cane 
0-813 0-804 0-885 0-896 0-903 


^ М 0-939 0-943 
0-846 0-901 0-923 0-935 0-942 м 


55 0-960 -963 
0:864 0-921 0-942 0-954 0:9605 | 0-965 0-968 
0:8695 | 0-926 0:9475 0:9585 0-965 it 


К 073 | 0.976 
0879 | 0935 | 0-050 | 0-967 | 0.973 
0882 | 0930 | 0060 | 0-970 | 007% 0-980 | 0.082 
0887 | 0913 | 0903 | 0.973 0.070 0-982 | 0-985 
080 | 0446 | 0-960 | 0976 0.951 0-984 | 0:087 
0894 | 0949 | 0969 | 0-978 0.983 


M 0-985 0-988 0:990 
0-901 0:955 0-973 0-982 0-987 0-989 0-991 
0-904 0:957 0:975 0-983* | 0-988 
0-906 0-960 0-977 0-985 0-989 0-992 0-993 
0-909 0-962 0-979 0-986 


| 
0990 | 0-993 | 0-994 | 
0:911 0-988 | 0-991 | 6-994 0-995 | 
0-014 0-989 | 0-992 | 0.004 | 00099 | 
0-916 0990 | 0-993 | 0.005 | 0.906 

0-918 0990 | 0904 | 0.005: | 0.997 

0-920 9991 | 0991 | 0.996 | 0007 

0-928 0992 | 0-995 | 0596 | 0.997 

0-924 0993 | 0005 | 0997 | 0.905 

0-926 0403 | 0006 | 0.997 | 0.008 

0-92 0-994 | 0-996 | 0.0975 0.998 

0-929 0-994 | 0-096" | 0.008 | 0.908 

0-930 0995 | 0007 | oops | 0.095 

0-932 0905 | 0997 | 0:998- 0.909 

0-933 0995 | 0007 | 0.998 | 0.999 

0-935 0996 | 0-908 | 0.998. 0-099. 

0-036 04% | 0008 0.999 | 0.999 

0-037 9998 | 0998 | оооу 0.999 

0-938 0-9965 | 0-998 | 0.900 0-999 " 
0-9395 0997 | 0008 | 0.999 | 0.999 

0-941 $297 | 0-898 | 0-999 | 0.095 

0-042 0997 | 0-998" | 0000 | 0000, 

0-943 0997 | 0909 | 0000 | 0999: E 
0-944 097% | 0099 | o.999 | 1.000 | 
0-945 0998 | 0-999 | 0.999 

0-946 0-998 | 0-999 | 0.999 

0-947 9-998 | 0-999 | 0.9995 

0:947 


0-998 0-999 0-9995 


Note.—For the test of 
The terminal small 
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P Biven in 4.15 у should be 
type 5 means that the four- 


TABLE 4 


FOR VALUES OF у FROM 1 то 20 


k п. 12. 13. 14. 15. 16. 17. 18. 19. 20. 
9 0-500 | 0-500 | 0-500 
0-1 3 52 -5: 
02 
02 
04 
0-5 
0-6 
0-7 
0:8 
09 
10 
11 
12 
13 
14 
15 
16 
17 
18 
1-9 
20 
24 
22 
93 
24 
25 
26 
28 
29 
3-0 0-995 
3-1 0-006 
32 0-9905 
33 0:997 
34 0:997 0-998 
25 0-998 | 0-998 
3:6. 0-998 0-998 
37 0-998* | 0-999 
38 0-999 | 04 
3-9 0. 0-099 | 9.9999 
40 0-999 о 0-099* | 1-000 
ES 0.000 0-999 1-000 
42 0-999 0:9995 
43 0-999 1-000 
44 995 
45 0-9095 
| 46 1-000 


taken as n — 2, where п is the number in the ranking. 
figure values end in 5 and cannot be rounded off. 
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APPENDIX TABLE БА 


CONCORDANCE COEFFICIENT W. PROBABILITY THAT A GIVEN VALUE 


ОЕ S WILL BE ATTAINED OR EXCEEDED FOR n = 3 AN 


FROM 2 TO 10 
Values of m 


D VALUES OF m 


8 9 10 
| d E 
| 
1:000 | 1-000 | 1-000 | 1-000 | 1-000 1:000 1:000 1:000 
0-944 | 0-031 | 0-054 | 0-956 | 0-964 0-967 0-971 0:974 
0:528 | 0-653 | 0-691 | 0-740 0-768 | 0-794 0:814 0:830 
0-361 | 0-431 | 0-522 | 0-570 0:620 | 0-654 0:685 0-710 
0-194 | 0-273 | 0-367 | 0-430 0-486 | 0-531 0-569 0-601 
0-028 | 0-125 | 0-182 | 0.252 0:305 0-355 0-398 0:436 
0-069 | 0-124 |0:184 | 0-237 0-285 0-328 0-368 
0-042 | 0-093 | 0-142 | 0-192 0-236 0-278 0-316 
0:0046 | 0-039 | 0-072 | 0-112 0-149 0-187 0-222 
0-024 | 0-052 | 0-085 0-120 0-154. 0-187 
0:0085 | 0-029 | 0-051 0-079 0-107 0-135 
| 0:0277 | 0-012 | 0-027 0-047 0-069 0:092 
| 0-0081 | 0-021 0-038 0-057 0:078 
0:0055 | 0-016 0-030 0-048 0-066 
| 0:0017 | 0-0084 | 0-018 0-031 0-046 
| 0.0513 0:0036 | 0.0099 0:019 0-030 
0:0027 | 0-0080 0-016 0-026 
0:0012 | 0-0048 | 0:010 0:018 
0-0°32 | 0-0024 0:0060 | 0-012 
0-0°32 | 0.0011 0-0035 | 0-0075 
| 0-0421 | 0-0286 0:0029 | 0-0063 
| 0-0?26 | 0-0013 | 0-0034 
0:0%61 | 0-0?6G | 0-0020 
0-061 | 0-0?35 | 0-0013 
0:0%61 | 0-0?20 | 0-0383 
00536 | 0-0497 | 0-0951 
0-0454 | 0-0337 
0-0411 0-0?18 
0-0411 | 0-0?11 
0-011 | 0-0485 
0-0411 | 0-044 
0-0°60 | 0-0420 
0۰0411 
0:0521 
0:0799 
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APPENDIX TABLE 5в 


CONCORDANCE COEFFICIENT W. PROBABILITY THAT A GIVEN VALUE OF 5 
WILL BE ATTAINED OR EXCEEDED FOR n = 4 AND M = 3 AND 5 


5 т = 3 т = 5 S m=5 
1 1:000 1:000 61 0:055 
3 0-958 0-975 65 0:044 
5 0:910 0:944 67 0:034 
9 0-727 0:857 69 0:031 

11 0-608 0-771 78 0:028 

18 0:524 0:709 75 0-020 

17 0-446 0-652 77 0:017 

19 0:342 0:561 81 0:012 

21 0-300 0-521 . 88 0:0087 

25 0-207 0-445 85 0:0067 

27 0:175 0-408 89 0:0055 

29 0:148 0-372 91 0-0031 

33 0-075 0-298 93 0.0023. 

35 0-054. 0-260 97 0-0! 

37 | 0:038 0:226 99 0-001 
41 0-017 0-210 101 0:0014 
48 | 0:0017 0:162 105 0:064 
45 0:0017 0-141 107 0-0?33 
49 0-123 109 0۰0321 
51 0-107 113 0-0°14 
53 0-093 117 0۰0448 
57 0-075 125 0-0530 

| 59 E 0-067 


147 


APPENDIX TABLE 5с 


CONCORDANCE COEFFICIENT W. PROBABILITY THAT A GIVEN VALUE or 8 


WILL BE ATTAINED OR EXCEEDED FOR n = 4 AND m = 2, 4 AND 6 


5 m == т = 4 т = 6 5 т = 6 
о 1-000 1-000 1-000 82 0-035 
2 0-958 0-992 | 0-996 84 0-032 
4 0-833 0:928 | 0.957 80 0:029 
6 0-792 0:900 0-940 88 0:023 
8 0:625 0-800 0.874 90 0:022 

10 0-542 0:754 0-844. 94 0:017 

12 0:458 0:677 0:789 96 0:014 

14 0-375 0-649 0-772 98 0-013 

16 0-208 0-524 0-679 100 0-010 

18 0-167 0-508 0-668 102 0-0096 

20 0-042 0-432 0-609 104 0۰0085 

22 0-389 0-574. 106 0-0073 

24 0-355 0-541 108 0-0061 

26 0-324 0-512 110 0-0057 
30 0-242 0-491 114 0-0040 
32 0-200 0-386 116 0:0088 
за 0-190 0:375 118 0:0028 
86 0:158 0-338 120 0۰0023 
38 0-141 0-317 122 0-0020 
40 0:105 0:270 126 0-0015 

42 0-094 0-256 128 0-0°90 

44 0:077 0:230 180 0-0°87 

46 0-068 0:218 182 0.073 

48 0-054 0-197 184 0:0265 
50 0:052 0:194 186 0-0740 
52 0-036 0-163 188 0-0?36 
54 0-033 0-155 03% 

140 0-0228 

56 0-019 0:197 а 

144 0:0324 

58 0-014, 0-114 146 0:0222 
р 0-012 б = 
62 0-108 148 0-0312 

64 0:0069 0-089 5 S 

i 150 0:0195 

66 0:0062 0-088 2 aga 

68 0.0027 0 452 0:0462 

“078 154 0-0446 
'| 0-0027 
70 0-066 158 0-0424 
72 9.0016 0-060 160 а 
0:0394 88 0:0416 
74 0-056 162 " 
0:0294, , 0-0412 
76 0-043 164 
78 0-0794 0-041 170 hen 
0-0472 0. а 
80 087 180 0:0%13 
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5 m=3 5 m=3 


1-000 44 0-236 
1:000 46 0-213 
0-988 48 0-172 
0-972 50 0-163 
0:941 52 0-127 
10 0-914 54 0117 
12 0-845 56 0-096 
14 0-831 58 0-080 


carne 


60 0-063 
62 0-056 
64 0:045 
66 0:038 
68 0-028 
70 0-026 
72 0:017 
та 0:015 
76 0:0078 
78 0:0053 
80 0:0040 
82 0:0028 
86 0۰0390 
90 0۰069 


. 
APPENDIX TABLE 5р 
CONCORDANCE COEFFICIENT W. PROBABILITY THAT A GIVEN VALUE OF S 
WILL BE ATTAINED OR EXCEEDED FOR n =5 AND M = 3 
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APPENDIX TABLE 6 


SIGNIFICANCE POINTS оғ S (FOR THE COEFFICIENT oF CONCORDANCE W) 


From Friedman (1940) by permission of the author and the 
Editor of the Annals of Mathematical Statistics 


— 


Additional values 
n for n = 3 
2. a? 
8 4 5 6 7 т 5 
Values at 0:05 Level of Significance 
3 64-4 | 103-9 157-8 | 9 54-0 
4 49-5 884 | 1483 217-0 12 71:9 
5 62-6 | 112-3 | 1824 276-2 14 83-8 
6 75-7 | 1861 | 221.4 335-2 16 95-8 
8 481 | 1017 | 183.7 299.0 458-1 18 107-7 
10 60-0 | 1278 | 231.2 | 3767 571-0 
15 898 | 192-9 | 349.8 | 570.5 864-9 
20 1197 | 258-0 | 468:5 764-4 | 1158.7 


Values at 0-01 Level of Significance 


8 7556 | 1228 | 1856 5:9 
4 61-4 | 1098 | 176.9 265-0 12 1085 
5 80-5 | 1428 | 2959.4 848-8 14 121-9 
6 99-5 | 176-1 282-4 422.6 16 140-2 
8 66-8 187-4 2427 888-8 579-9 18 158-6 

10 851 | 1753 | 309-1 | 4040 737-0 

15 1810 | 2698 | 4752 788.2 1129.5 

20 1770 | 364-2 | 641-2 


1022-2 | 1591.9 
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APPENDIX TABLE ТА 
5 PER CENT POINTS OF THE DISTRIBUTION OF 2 


Reprinted from Table VI of Prof. R. A. Fisher’s Statistical Methods for Research 
Workers, Oliver & Boyd, Ltd., Edinburgh, by permission of the author and 
publishers 


Values of эү. 


8. 12. 24. ©. 


2.6870 | 2-7071 | 2,7194 | 2:7276 2.7380 | 2-7484 | 9.7588 | 2.7693 
1-4800 | 1-4808 | 1-4819 1.4830 | 1:4840 | 14851 
1.0994 | 1.0953 | 1.0899 1:0842' | 1-0781-| 1:0716 
0-9168 | 0-9093 | 0-8993 0.8885 | 0:8767 | 0:8639 
0-8097 | 0-7997 0.7862 0-7714 | 0-7550 | 0:7368 
0.7558 | 0-7394 | 0-7274 | 0:7112 0.6931 | 0-6729 0-8499 
0-6896 | 0-6761 | 0-6576 | 0.0369 0.6134 | 0:5862 
0.6525 | 0-6378 | 0-6175 | 0:5945 0.5682 | 0.5371 
0.6238 | 0-6080 | 0-5862 | 0-5613 0.5324 | 0-4979 
0-6009 | 0.5843 0:5035 | 0-4657 


abel 
28 8 2 
35БЕ 
S25 
өзе” 
сотту 
REESE 
28882 
8888 
cont 
honk 
2888 
— 8 8 8 
EKE 
cont 
ЕРЕ 
8855 
5554 
8822 


Sons n e 


© 
а 
е 
е 
а 
E 
= 
© 


0:5322 | 0-5648 | 0-5406 | 0:5126 0-4795 | 0-4387 
0-5666 | 0-5487 0.5234 0-494) | 0-4592 | 0-4156 
0-5535 | 0-5350 | 0-5089 | 0-4785 0-4419 | 0-3957 
0:5423 | 0.5233 0-4269 | 0:3782 
585 | 0:5326 | 0-5131 04138 | 0:3628 
505 0-5241 | 0-5042 0-4022 | 0:3490 
434 | 0-5166 | 0:4964 0:3919 | 0:3366 


coco ooccc 


0:7630 


8 * 
Ф 
ue 
E 
= 
E 
е 
ё 
5 
= 


0:7514 | 0-6451 


Values of va. 
= 


0:6393 i 
0-6341 371 | 0-5099 | 0-4894 0-4602 | 0-4255 | 0:3827 0.3253 
0-6295 315 | 0-5040 | 0-4832 | 0-4535 0-4182 | 0-3743 | 0-3151 


265 | 0-4986 | 0-4776 | 0-4474 0-4116 | 0-3608 | 0-3057 


0.6254 


0.5219 | 0-4938 | 0-4725 | 0-4420 0-4055 | 0-3599 | 0-2971 
0.5178 | 0-4894 | 0-4679 | 0-4370 0.4001 | 0:3536 | 0-2892 
0.5140 | 0-4854 | 0-4636 | 0:4325 0-3950 | 0-3478 | 0:2818 
0-5106 | 0-4817 | 0-4598 0.4283 0-3904 | 0-3425 0:2749 
0-5074 | 0.4783 | 0-4562 0-4244 | 0-3862 | 0:3376 0:2685 
0-5045 | 0.4752 | 0:4529 | 0-4209 | 0.3823 | 0:3330 | 0:2625 
0-5017 | 0:4723 | 0-4499 1 0.4176 | 0:3786 | 0:3287 0.2569 
0-4992 | 0-4696 | 0-4471 | 0-4146 | 0-3752 | 0:3248 0.2516 

| 

1 


d 0-4969 | 0-4671 0-4444 | 0-4117 | 0-3720 0-3211 | 0:2466 
0-5904 | 0-5362 | 04947 0-4648 | 0-4420 | 0-4090 0-3691 | 0-3176 | 0-2419 


0-5738 | 0-5073 | 0-4632 0-4311 | 0-4064 | 0:3702 | 0.3255 0:2654 | 0-1644 


0-3309 | 0-2804 | 0-2085 | 0 


0:5486 | 0-4787 | 0-4319 0-3974 | — 25 


— Ч 
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APPENDIX TABLE тв 


1 Per CENT POINTS OF THE DISTRIBUTION OF 2 


Reprinted from Table VI of Prof. R. A. Fisher's Statistical 
ы Workers, Oliver & Boyd, Ltd., Edinburgh, 


publishers 


Methods for Research 
by permission of the author and 


Values of vi. 


| 1 
T 2 3. 4. 5. 6. 8 12. | 24 
i = ЕЕ 

1 | 41535 | 4-2585 | 4-2974 4:3482 4-3089 | 413794 
2 2.2950 | 2-2976 | 2.2984 2-2994 22999 | 2-3001 
3 | 17649 | 1.7140 | 1-6915 1-6569 1-6404 | 1-6314 
4 1.5270 1-4452 | 1-4075 1:3473 | 13327 | 1-3170 | 1-3000 
-5 | 1-3943 | 1-2929 1-2449 1-1656 | 1-1457 | 1-1239 1-0997 
6 | 13103 | 1-1955 | 1-1401 1-0460 1-0218 | 0-9948 0-9643 
7 | 1-2526 | 1-1281 | 1-0672 0-9614 | 0-9 0-9020 | 0-8658 
8 | 1-2106 | 1-0787 1-0135 0-8983 0-8673 | 0-8319 | 0-7904 
9 | 1-1786 | 1-0411 ! 0-9724 0-8494 | 0-8157 0-7769 | 0-7305 
10 | 11535 | 1-0114 0-9399 0-8104 | 0-7744 0-7324 | 0-6816 

11 | 1-1333 | 0-9874 | 0-9136 0-7785 | 0-7405 ; 
.12 | 1-1166 | 0-9677 | 0-8919 0-7520 94122 D 910901 
13 | 1-1027 | 09511 | 0.8737 0:7295 | 0-0882 | 0.6386 | 0-5761 
# | 14 | 10909 | 0-9370 | 0-8581 0:7103 | 0-6675 | 0-6159 | 0-5500 
« | 15 | 1.0807 | 0-9249 | 08448 0:6937 | 0-6496 | 0-5961 | 0.5900 
9 | 16 | 1-0719 | 0-9144 | 0-8331 9-6791 | 0-6339 | 0.5786 | 0.5064 
8 | 17 | 1-0641.| 0-0051.) 0.8229 0:6663 | 0-6199 | 0:5630 | 0.1870 
5 | 18 | L0572 | 0-8070 | 0.8138 0-6549 | 0-6075 | 0-5491 | 0.4712 
= |19 | L0511 | 0-8897 | 0-8057 0:6447 0-5366 | 0-4560 
20 | 10457 | 0:8831 | 0-7085. | 0-7443 | 0.7058 | 0.0768 | 0.6355 0-5253 | 0-4491 
22 10408 | 0:8772 | 0-7920 | 0:7372 | 0-984 | о.6600 | 0.0272 05150 | 0-42 
23 10352 | 0-8718 | 0-7860 | 0-7309 | 0-6916 | 0.6020 | 00272 0.9090 9990 
23 | 10322 | 0-8670 | 0-7806 | 0.7251. 0-6855 0.6955 9.6127 9.4009 | 0-4008 
24 | 10285 | 0-8626 | 07757. 0-7197 | 0-6799 0.0490 0-6064 0-4890 | 0:3967 
25 | 10251 | 0-8585 | 0:7712 | 0:7148 | 0.6147 |0444 0:6006 0-4816 | 0-3872 
26 | 1:0220 | 0:8548 | 0-7670 | 0-7103 | 0-6699 | 0.6302 0:5952 | 0-5422 | 0.4748 | 0.3794 
27 | 10191 | 0-8513 | 0-7631 | 0-7062 | 0-6655 | 0.6346 0:5902 | 0:5367 | 0-4685 | 0.3701 
38 | 1:0164 | 0:8481 | 0-7595 | 0-7023 | 0-6614 | 0.0303 0:5856 0-4626 | 0.3024 
29 | 1:0139 | 0-8451 | 0-7562 | 0-6987 | 0.6576 | 0.0203 0-5813 | 0. 04510 | 0.3550 
| 90 | 10116 | 0:8423 } 0-7531 | 0-6954 | 0:6540 | 0.625 055713 | 0-5224 | 0-4519 | 0.3181 

i 

60 | 00784 | 0:8025"; 0-7086 | 0-6472 | 0.0028 | 0.5687 0:5189 | 0-4574 | 0-3746 | 0.0359 

20 | 0:9462 | 0:7636 | 0-6651 | 0:5999 | 0-5522 | 0.5152 0-4604 | 0-3908 | 0-2913 | o 
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PAIRED COMPARISONS. 


APPENDIX TABLE 9 


FREQUENCY (f) OF VALUES oF d AND PROBABILITY 


(P) тнат VALUES WILL BE ATTAINED OR EXCEEDED 
n == п = 3 п = 4 n=5 n=6 n=7 

Value 

оға 
| Pe f E РД Р 7 p F P 7 B 
0 | 2 1-000 | 6 |1:000 | 24 1.000 120 1-000 720 | 1-000 5,040 | 1-000 
1 2 |0:250 | 16 9.625 120104883, 900 0-978 8,400 | 0-998 
2 24 0.375 940 | 0-766 2,240 | 0-949 21,840 | 0:094 
8 240 | 0-531 | 2,880 0.880 38.000 | 0-083 
4 280 | 0:297 | 6,240 | 0-792 | 75.600 | 0-967 
5 24 | 0:023 | 3,048 | 0-602 | 90,384 | 0-931 
- 6 8,640 | 0-491 | 170,760 | 0-888 
7 4,800 | 0-227 | 188,160 | 0-802 
8 2,640 | 0-081 | 277,200 | 0-713 
9 280,560 | 0-580 
10 384,048 | 0-447 
13 244,160 | 0-268 
233,520 | 0-147 
18 72,240 | 0:030 | 
14 2,640 | 0-001 
Тота 2| — 8 — 64 — |1024| — 82,768 | — 2,097,152 | — 
i 

4 
` 
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APPENDIX TABLE 10a 


AGREEMENT IN PAIRED COMPARISONS. THE PROBABILITY P THAT A VALUE 
or X (FOR и) WILL BE ATTAINED OR EXCEEDED, FOR m = 3, п = 2 TO 8 


n=2 n=3 n=4 n=5 n=6 п = 7 п = 8 
= Р > m = P zr P z E > Р = Р 
1 | 1:000 | 8 | 1-000 6 | 1-000 10 | 1:000 15 1-000 | 21 | 1:000 | 28 | 1-000 
З 0:250 | 5 | 0-578 8 | 0-822 12 0.944 | 17 |0-987 23 | 0-998 | 30 | 1-000 
7 | 0-156 | 10 | 0-466 14 | 0-756 | 19 0-920 | 25 | 0-981 | 32 | 0:997 

9 0.016 12 |0169 | 16 | 0-474 | 21 | 0-764 | 27 0925 54 0-983 

14 | 0:038 | 18 | 0-224 23 | 0-539 | 29 | 0-808 | 36 | 0-945 

16 | 0-0046 | 20 | 0-078 | 25 | 0-314 31 | 0-633 | 38 | 0-865 

18 0.024 22 0.020 27 | 0-148 | 33 | 0-433 | 40 | 0-736 

24 | 0:0035 | 29 | 0-057 35 | 0-256 42 | 0-572 

26 | 0-0342 | 31 | 0-017 | 37 | 0-180 | 44 | 0400 

28 | 0-0430 | 33 | 0-0042 | 39 | 0-056 | 46 | 0-250 

| 30 | 0-0695 | 85 | 0:0379 | 41 | 0-021 | 48 | 0-138 

| 37 0012 43 | 0:0064 50 | 0-068 

| 39 0.012 45 | 0-0017 | 52 | 0-029 

| 41 | 0:0692 | 47 | 00397 | 54 | 0-011 

43 | 0-0743 | 49 | 0-068 | 56 | 0:0038 

45 | 0-0?93 | 51 | 0-010 | 58 | 0-0011 

53 | 0-0512 | 60 | 9 

| 55 | 00°12 | 62 | 0:0%66 

57 | 0-0586 | 64 0.0413 

59 | 0-0*44 | 60 | 0:0522 

61 | 0-0!°15 68 | 00°32 

63 | 001*23 70 | 0:0740 

72 | 0-0842 

74 | 0:096 

С 76 | 0-01024 

78 | 0-011318 

80 0:0:348 

82 | 0-011312 

84 | 0.01514 
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“. APPENDIX TABLE 10в 


* ^ AGREEMENT IN PAIRED COMPARISONS. 


Tue PROBABILITY P THAT А 
VALUE OF X (FOR и) WILL BE ATTAINED OR EXCEEDED, ror m = 4 


ANDA = 2 TO 6 (FOR п = 6 ONLY VALUES BEYOND THE 1 PER CENT 


n E Жык ДИ POINT ARE GIVEN) 
MS WE E ipt 
55е Шыр 8 * n=5 * =5 n=6 n = 
: ее Ра Sere Т pots) р | је» 
5 == Ма |8 2. | ч А j 
АШЫ 100 00048 | 57 | 0-014 | 79 | 0-042 
À 5 10025 0:0030 | 58 | 0-0092.| 80 | 0-0*28 
ae | за | 0.0017 | 59 | 0-0058 | 81 | 0:0%08 
2 | 0-0?73 | 60 | 0-0037 | 82 0:0915 
S 0-0^41 | 61 | 0-0022 | 83 | 0.012 
Ne | 47 | 0-0324 | 62 | 0-0013 | 84 0-010651 
| 0-090 | 63 | 0:0276 | 86 | 0-01130 
aW 0797 | 64 | 0-044 | 87 | 001117 
T 5 | 65 | 00293 | 90 | 0:0128 
| 0-0593 | 66 | 0-0?13 
0-0521 | 67 | 0-0172 
— 0:0517 | 68 0.0730 
54 0.074 | 69 | 0:0:18 
т 56 | 0-0766 | 70 | 0-0507 
" 57 |0-0738 | 71 |.0-0547 
Е 60 0.093 | 72 0.0520 
" ) 73 | 0-0510 
A 0:038 4 ` 74 | 00°51 
М 7 У ^^ |75 0-0618 
{Т 80 |.0:024, |..." 76 | 0.0778 
c И 40 | 0-016 | TT | 0-0744. 
xr 41 | 0-0088 | 78 | 0-015 
* 4 - ты s a Я A 
„ т 
= as v - 
* A. " - ~ > =*^ 4 
y % i аа 
‚ : ` T а E ~ 
4 : EE 
3 * ~ + - СРЗ 
ي‎ 9 2 
= 156 лыс к d 
4 + 
га 


AGREEMENT IN PAIRED COMPARISONS. 
VALUE оғ J (FOR и) WILL ВЕ ATTAIN 


APPENDIX TABLE 10c 


Tye PROBABILITY P THAT K * 
ED OR EXCEEDED, FOR m -- ӛ У 


AND п =2 TO 5 у 
| | 4 5. = 
n=2 =3 | n=4 n=5 
= Р = Р 2: P z P iz P 
| А 
4 1-000 12 | 1-000 24 276 | 0-0450 
6 0-375 14 | 0-756 26 78. | 0-016 
10 0-063 16 | 0:390 28 80 | 0:0550 
18 | 0:207 30 82 | 00515 
20 | 0-103 32 84 | 0:0639 
22 | 0-030 34 86 0.0610 
24 | 0-011 36 88 | 0.023. 
96 | 0.0039 | -38 90 | 0-0553 
30 | 0.0724 | 40 | 0:024 92 | 002 | 
0-0092 94 | 00°14 
0-0036 96- | 0046 
0:0012 100 | 0091 
| 0-0836 
E 0:0312 © 
AU | 0.028, | GS | 0:0025 
| 0.0554. | 70 | 0:0010 
0-0518 | 72 | 0:0839 
0-0760 | 74 | 00714 
3 43 “Мм 7.) 5 ; 
у - ; 
2 — * 
à ы” 
LI * ^ + А, 
“ 
ч Д 
ы 1570 
+ 
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APPENDIX TABLE 105 


AGREEMENT IN PAIRED Comparisons. Tur PROBABILITY P THAT 0 
| VALUE. оғ X (FoR и) WILL BE ATTAINED OR EXCEEDED, FOR т = 


AND n =2 то 4 


~ | > 
п=г n=4 n=4 n=4 
| 
— Р “Ж Р 2 Р = P 2 Р 
6 17.000 18 1-000 74 | 00:12 
7 0-688 | 19 0-999 75 0:0599: 
10 0210 20 0-991 76 | 0:0549 
15 | 0031 2r 0-959 77 | 0.0532 
ғ 22 0-896 80 | 0:0%68 
23 0-822 81 | 0-0917 
Р 24 0-755 82 | 00°12 
T 20 0:669 85 0.0784 
27 0-556 90 | 0.0893 
28 0-466 
29 0-409 
30 0-337 
бі 0-257 
92. 0-209 
^ 35 0-175 
36 0-133 
37 0-097 | 71 
40 0-073 | 72 0-0448 
45 0:057 | 73 00416 
> 
> , 
а 3 ж 
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& 
д 
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(Greek letters are indexed under their spelling in Ношап letters, e.g. 7 under tau. 
References are to pages.) 


Agreement, coefficient of, 125-31. 
Babington Smith, B., refs., 89, 131. 


Chi-squared (72), in testing coefficient 

of concordance, 84-6, 95-6; in 
partial т, 101; in paired com- 

parisons, 124, 130 ; fitted to dis- 
tribution of cireular triads, 133— 
35, 136-8. 

Circular triads, 123-5, 132-8. 

Concordance, coefficient of, 81; re- 
lation with p, 81-2 ; significance 
of, 84-6 ; non-null case, 88-9. 

Confidence intervals, for т, 52-3. 

Conjugate rankings, 10—11, 21. 

Consistence, coefficient of, 128—5. 

Continuity, correction for, see Cor- 
rection. 

Correction for continuity, 41–6, 68—9, 
85-6. 


Daniels, H. E., 60, 65, 76; refs., 24, 
54, 79. 

Dantzig, G. B., refs., 54. 

Dichotomy, as a ranking, 32-6, 43-6. 

Disarray, т as coefficient of, 7-8, 21-3. 


Dubois, Р., refs., 36. 


Esscher, F., 106; refs, 16, 111. 
Estimation, 86-9, 97-8. 5 


Feller, W., refs., 16. 
Fisher's z-distribution, 84. 


Friedman, M., refs., 89. 


№ 


General rank correlation, 17. 
Grade correlation, 109, 117-18. 
Greiner, R., 106; refs., 16, 111. 


Haden, H. G., refs., 16. 
Hoffding, W., refs., 24, 54, 104. 
Hotelling, H., refs., 111. 


Interchanges, see Disarray. 


Kendall, M. G., 16, 85, 52, 60, 67, 76, 
90, 103, 137; refs, 16, 36, 54, 
79, 89, 104, 131. 


m rankings, problem of, 80-98. 
Moran, Р. А. P., 185; refs., 16, 181. 


Non-null case, significance in, 49-54, 
69-79. 

Normal correlation, relation with 
rank correlation, 105-20. 


Olds, E. G., refs., 54. 


Pabst, M. R., refs., 111. 

Paired comparisons, 121-38. 

Partial rank correlation, 99-104. 

Pearson, К., 118; refs., 16, 111. 

Pitman, E. J. G., 90; refs., 98. 

Population and trade, Example 1.4, 
14-15. ] 

Preferences, 1; in paired com- 
parisons, see Paired Comparisons. 
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160 ніш 
Product-moment corrélation, as par- 
ticular case of general rank 


correlation, 19-20; between p 
and т, 68. 


Rho (p), definition,- 8-9; relation 
-with т, 11; as particular case of 
general coefficient, 18-19; ties 
in; 29-32 ; significance of, 46-9 5 
distribution in null case, 62-5; 
joint distribution with т, 65-8 ; 
relation with coefficient of con- 
cordance, 81-2; relation with 


normal correlation, 108-11, 117- 
120. 


Sillitto, С. P., refs., 86, 47. 
Spearman, C., refs. 16. See also Rho. 
» footnote, 23-4. 

Stragglers, 11-13. ы 


Student's » distribution, 48, 65; 
refs., 36. и 


Tau (т), definition, 4-5; calculation 
of, 5-7; as- coefficient of dis- 
array, 7-8; relation with 5, 11; 


— INDEX : % 
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as particular case of general. 


f es n 25 ONAN 
coefficient, 17-18: ties in, 25-9; 


significance of, 3741; іп non- 
null ease, 40-54, 69-79; distri- 
‘bution in null case, 55-62; joint, 
distribution with p, 65-9; par- 
tial, 99-104; relation with nor- 
mal correlation, 105—8, ЫЛ» 

Tests of significance, 37—54, 54-79; 
of coefficient of concordance; 84- 
86, 90-6; of coefficient of agree- 
ment, 129-31. Ё 4 

Tied ranks, 25-36 ; in significance for 


т, 43-6; in coefficient of con- 
cordance, 82-4, 86, 94-5; in 
estimation, 88-9; іп paired 
comparisons, 128-9. 


Trade and population, Example M 


^ 14-15. 


Wallis, W, A., refs., 89. 

Welch, В. L., refs., 98. z 

Whitfield, J.. W., data from, 128; 
Tefs., 36. 3 ә 


-- 
Woodbury, М. refs., 86. 


Yule, 6. Udny, 17, 108. Nas du 


"Triads, in paired comparisons, 128-5, 
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